roboimi/docs/superpowers/plans/2026-04-04-imf-horizon-grid-and-attnres-ablation.md

# IMF Horizon Grid and AttnRes Ablation Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Run a 6-run Phase-1 IMF horizon/action-step experiment grid across available GPUs, monitor progress and collect best rollout metrics, then use the best horizon setting for a Phase-2 visual-attnres ablation.

**Architecture:** Use the current IMF training code as-is for Phase-1 by sweeping explicit `(pred_horizon, num_action_steps)` overrides while keeping emb=384, layer=12, and max_steps=50k fixed. Maintain a local experiment suite directory with a manifest and machine-readable status snapshots so progress can be resumed and summarized across turns. After Phase-1 completes, compare the current head-only attnres setup against a variant that also adds attnres into the visual ResNet path.

**Tech Stack:** Python, Hydra/OmegaConf, PyTorch, SSH/Tailscale, JSON/CSV status files, SwanLab.

---

### Task 1: Prepare the experiment suite manifest and state tracking

**Files:**
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/manifest.json`
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/status.json`
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/notes.md`

- [ ] Define the 6 legal Phase-1 combinations: `(8,8)`, `(16,8)`, `(16,16)`, `(32,8)`, `(32,16)`, `(32,32)`.
- [ ] Record for each run: name, host, GPU slot, command, log path, SwanLab run name, and completion criteria.
- [ ] Define the comparison metric as the maximum rollout average reward seen during training (`max avg_reward`), preferably read from the best-checkpoint metadata and cross-checked against logs.
- [ ] Keep `status.json` updated with per-run state: queued / running / finished / failed plus latest parsed progress.

### Task 2: Prepare the remote 8-GPU execution target

**Files:**
- Remote working directory under `/home/droid/`
- Reuse or create a synced code directory for this suite

- [ ] Verify the remote dataset path and environment path.
- [ ] Verify GPU availability and reserve 6 GPUs for Phase-1 launches.
- [ ] Sync the required code to a dedicated remote suite directory.
- [ ] Record exact remote paths back into the local suite manifest.

### Task 3: Launch the 6 Phase-1 experiments in parallel

**Files:**
- Reuse: `roboimi/demos/vla_scripts/train_vla.py`
- Modify only local suite tracking files unless a launch bug is discovered

- [ ] Launch 6 runs concurrently with fixed settings: IMF, emb=384, layer=12, max_steps=50k.
- [ ] Keep all other relevant training hyperparameters aligned to the current strong baseline unless a concrete blocker appears.
- [ ] Assign one GPU per run on the 8xL20 host.
- [ ] Capture PID, log path, and SwanLab URL for each run in `status.json`.

### Task 4: Monitor and summarize Phase-1 until all 6 finish

**Files:**
- Update: `experiment_suites/2026-04-04-imf-horizon-grid/status.json`
- Update: `experiment_suites/2026-04-04-imf-horizon-grid/notes.md`

- [ ] Periodically parse each run’s log/checkpoints to extract latest step, latest rollout reward, and best rollout reward so far.
- [ ] Keep a resumable local summary so progress can be continued in later turns without rediscovery.
- [ ] After all 6 runs finish, rank them by `max avg_reward` and write a compact Phase-1 summary.

### Task 5: Prepare the Phase-2 visual-attnres ablation

**Files:**
- Likely modify: vision backbone implementation and config files (to be confirmed after code inspection)
- Add/update targeted tests for the visual backbone path if code changes are needed

- [ ] Use the best Phase-1 `(pred_horizon, num_action_steps)` combination as the fixed rollout setting for Phase-2.
- [ ] Compare:
  1. current setup: attnres only in the IMF head
  2. ablation setup: attnres in both IMF head and visual encoder path
- [ ] Keep the rest of the training settings fixed.
- [ ] Launch and monitor the Phase-2 pair after Phase-1 summary is complete.