3.1 KiB
Phase-2 Full-AttnRes Vision Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace all ResNet residual units in the vision backbone with AttnRes-based image blocks while preserving the current IMF agent interfaces and launch a Phase-2 experiment anchored on the best Phase-1 horizon setting.
Architecture: Keep the current multi-camera encoder shell and per-camera output contract, but introduce a new ResNet-like 2D AttnRes backbone that preserves stage-wise downsampling and final SpatialSoftmax conditioning. Wire it into the existing ResNetDiffusionBackbone via an opt-in mode and keep the agent/head/data interfaces unchanged.
Tech Stack: PyTorch, Hydra/OmegaConf, existing IMF AttnRes transformer components, pytest.
Task 1: Add failing tests for the new full-AttnRes visual backbone
Files:
-
Create:
tests/test_attnres_resnet2d_backbone.py -
Update:
tests/test_imf_vla_agent.py -
Step 1: Write a failing backbone shape test
-
Step 2: Run it to confirm the new backbone/config does not exist yet
-
Step 3: Add a failing IMF agent wiring test for unchanged cond_dim=208
-
Step 4: Run the targeted tests and capture the failure
Task 2: Implement a ResNet-like 2D AttnRes backbone
Files:
-
Create:
roboimi/vla/models/backbones/attnres_resnet2d.py -
Modify:
roboimi/vla/models/backbones/resnet_diffusion.py -
Step 1: Add minimal 2D tokenization helpers and positional encoding / bias handling
-
Step 2: Implement
AttnResImageBlock2Dfor feature maps -
Step 3: Implement
AttnResResNetLikeBackbone2Dwith stage-wise downsampling -
Step 4: Wire
_SingleRgbEncoderto choose between original ResNet trunk and the new full-AttnRes trunk -
Step 5: Run the new backbone tests
Task 3: Expose config switches and agent wiring
Files:
-
Modify:
roboimi/vla/conf/backbone/resnet_diffusion.yaml -
Modify:
roboimi/vla/conf/agent/resnet_imf_attnres.yaml -
Step 1: Add a backbone mode/config flag for the full-AttnRes vision trunk
-
Step 2: Add defaults for attnres image depth/heads/etc. if needed
-
Step 3: Add a Phase-2 launch override path that enables the new visual trunk
-
Step 4: Run agent wiring tests again
Task 4: Smoke-verify training path
Files:
-
Reuse existing training scripts and configs
-
Step 1: Run a short CPU or tiny-step smoke instantiation /
compute_losstest -
Step 2: If needed, run a very short training smoke launch
-
Step 3: Verify no cond-dim or rollout-loading regressions
Task 5: Launch the Phase-2 experiment
Files:
-
Update experiment tracking under
experiment_suites/ -
Step 1: Use Phase-1 best setting (
pred_horizon=16,num_action_steps=8) -
Step 2: Launch baseline reference or reuse existing result
-
Step 3: Launch full-AttnRes vision experiment
-
Step 4: Track rollout metrics and compare max avg_reward