# Phase-2 Full-AttnRes Vision Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Replace all ResNet residual units in the vision backbone with AttnRes-based image blocks while preserving the current IMF agent interfaces and launch a Phase-2 experiment anchored on the best Phase-1 horizon setting.

**Architecture:** Keep the current multi-camera encoder shell and per-camera output contract, but introduce a new ResNet-like 2D AttnRes backbone that preserves stage-wise downsampling and final SpatialSoftmax conditioning. Wire it into the existing `ResNetDiffusionBackbone` via an opt-in mode and keep the agent/head/data interfaces unchanged.

**Tech Stack:** PyTorch, Hydra/OmegaConf, existing IMF AttnRes transformer components, pytest.

---

### Task 1: Add failing tests for the new full-AttnRes visual backbone

**Files:**
- Create: `tests/test_attnres_resnet2d_backbone.py`
- Update: `tests/test_imf_vla_agent.py`

- [ ] **Step 1: Write a failing backbone shape test**
- [ ] **Step 2: Run it to confirm the new backbone/config does not exist yet**
- [ ] **Step 3: Add a failing IMF agent wiring test for unchanged cond_dim=208**
- [ ] **Step 4: Run the targeted tests and capture the failure**

### Task 2: Implement a ResNet-like 2D AttnRes backbone

**Files:**
- Create: `roboimi/vla/models/backbones/attnres_resnet2d.py`
- Modify: `roboimi/vla/models/backbones/resnet_diffusion.py`

- [ ] **Step 1: Add minimal 2D tokenization helpers and positional encoding / bias handling**
- [ ] **Step 2: Implement `AttnResImageBlock2D` for feature maps**
- [ ] **Step 3: Implement `AttnResResNetLikeBackbone2D` with stage-wise downsampling**
- [ ] **Step 4: Wire `_SingleRgbEncoder` to choose between original ResNet trunk and the new full-AttnRes trunk**
- [ ] **Step 5: Run the new backbone tests**

### Task 3: Expose config switches and agent wiring

**Files:**
- Modify: `roboimi/vla/conf/backbone/resnet_diffusion.yaml`
- Modify: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`

- [ ] **Step 1: Add a backbone mode/config flag for the full-AttnRes vision trunk**
- [ ] **Step 2: Add defaults for attnres image depth/heads/etc. if needed**
- [ ] **Step 3: Add a Phase-2 launch override path that enables the new visual trunk**
- [ ] **Step 4: Run agent wiring tests again**

### Task 4: Smoke-verify training path

**Files:**
- Reuse existing training scripts and configs

- [ ] **Step 1: Run a short CPU or tiny-step smoke instantiation / `compute_loss` test**
- [ ] **Step 2: If needed, run a very short training smoke launch**
- [ ] **Step 3: Verify no cond-dim or rollout-loading regressions**

### Task 5: Launch the Phase-2 experiment

**Files:**
- Update experiment tracking under `experiment_suites/`

- [ ] **Step 1: Use Phase-1 best setting (`pred_horizon=16`, `num_action_steps=8`)**
- [ ] **Step 2: Launch baseline reference or reuse existing result**
- [ ] **Step 3: Launch full-AttnRes vision experiment**
- [ ] **Step 4: Track rollout metrics and compare max avg_reward**