# Phase-2 Full-AttnRes Vision Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Replace all ResNet residual units in the vision backbone with AttnRes-based image blocks while preserving the current IMF agent interfaces and launch a Phase-2 experiment anchored on the best Phase-1 horizon setting. **Architecture:** Keep the current multi-camera encoder shell and per-camera output contract, but introduce a new ResNet-like 2D AttnRes backbone that preserves stage-wise downsampling and final SpatialSoftmax conditioning. Wire it into the existing `ResNetDiffusionBackbone` via an opt-in mode and keep the agent/head/data interfaces unchanged. **Tech Stack:** PyTorch, Hydra/OmegaConf, existing IMF AttnRes transformer components, pytest. --- ### Task 1: Add failing tests for the new full-AttnRes visual backbone **Files:** - Create: `tests/test_attnres_resnet2d_backbone.py` - Update: `tests/test_imf_vla_agent.py` - [ ] **Step 1: Write a failing backbone shape test** - [ ] **Step 2: Run it to confirm the new backbone/config does not exist yet** - [ ] **Step 3: Add a failing IMF agent wiring test for unchanged cond_dim=208** - [ ] **Step 4: Run the targeted tests and capture the failure** ### Task 2: Implement a ResNet-like 2D AttnRes backbone **Files:** - Create: `roboimi/vla/models/backbones/attnres_resnet2d.py` - Modify: `roboimi/vla/models/backbones/resnet_diffusion.py` - [ ] **Step 1: Add minimal 2D tokenization helpers and positional encoding / bias handling** - [ ] **Step 2: Implement `AttnResImageBlock2D` for feature maps** - [ ] **Step 3: Implement `AttnResResNetLikeBackbone2D` with stage-wise downsampling** - [ ] **Step 4: Wire `_SingleRgbEncoder` to choose between original ResNet trunk and the new full-AttnRes trunk** - [ ] **Step 5: Run the new backbone tests** ### Task 3: Expose config switches and agent wiring **Files:** - Modify: `roboimi/vla/conf/backbone/resnet_diffusion.yaml` - Modify: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml` - [ ] **Step 1: Add a backbone mode/config flag for the full-AttnRes vision trunk** - [ ] **Step 2: Add defaults for attnres image depth/heads/etc. if needed** - [ ] **Step 3: Add a Phase-2 launch override path that enables the new visual trunk** - [ ] **Step 4: Run agent wiring tests again** ### Task 4: Smoke-verify training path **Files:** - Reuse existing training scripts and configs - [ ] **Step 1: Run a short CPU or tiny-step smoke instantiation / `compute_loss` test** - [ ] **Step 2: If needed, run a very short training smoke launch** - [ ] **Step 3: Verify no cond-dim or rollout-loading regressions** ### Task 5: Launch the Phase-2 experiment **Files:** - Update experiment tracking under `experiment_suites/` - [ ] **Step 1: Use Phase-1 best setting (`pred_horizon=16`, `num_action_steps=8`)** - [ ] **Step 2: Launch baseline reference or reuse existing result** - [ ] **Step 3: Launch full-AttnRes vision experiment** - [ ] **Step 4: Track rollout metrics and compare max avg_reward**