Files
roboimi/docs/superpowers/plans/2026-04-04-imf-horizon-grid-and-attnres-ablation.md

4.0 KiB
Raw Permalink Blame History

IMF Horizon Grid and AttnRes Ablation Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Run a 6-run Phase-1 IMF horizon/action-step experiment grid across available GPUs, monitor progress and collect best rollout metrics, then use the best horizon setting for a Phase-2 visual-attnres ablation.

Architecture: Use the current IMF training code as-is for Phase-1 by sweeping explicit (pred_horizon, num_action_steps) overrides while keeping emb=384, layer=12, and max_steps=50k fixed. Maintain a local experiment suite directory with a manifest and machine-readable status snapshots so progress can be resumed and summarized across turns. After Phase-1 completes, compare the current head-only attnres setup against a variant that also adds attnres into the visual ResNet path.

Tech Stack: Python, Hydra/OmegaConf, PyTorch, SSH/Tailscale, JSON/CSV status files, SwanLab.


Task 1: Prepare the experiment suite manifest and state tracking

Files:

  • Create: experiment_suites/2026-04-04-imf-horizon-grid/manifest.json

  • Create: experiment_suites/2026-04-04-imf-horizon-grid/status.json

  • Create: experiment_suites/2026-04-04-imf-horizon-grid/notes.md

  • Define the 6 legal Phase-1 combinations: (8,8), (16,8), (16,16), (32,8), (32,16), (32,32).

  • Record for each run: name, host, GPU slot, command, log path, SwanLab run name, and completion criteria.

  • Define the comparison metric as the maximum rollout average reward seen during training (max avg_reward), preferably read from the best-checkpoint metadata and cross-checked against logs.

  • Keep status.json updated with per-run state: queued / running / finished / failed plus latest parsed progress.

Task 2: Prepare the remote 8-GPU execution target

Files:

  • Remote working directory under /home/droid/

  • Reuse or create a synced code directory for this suite

  • Verify the remote dataset path and environment path.

  • Verify GPU availability and reserve 6 GPUs for Phase-1 launches.

  • Sync the required code to a dedicated remote suite directory.

  • Record exact remote paths back into the local suite manifest.

Task 3: Launch the 6 Phase-1 experiments in parallel

Files:

  • Reuse: roboimi/demos/vla_scripts/train_vla.py

  • Modify only local suite tracking files unless a launch bug is discovered

  • Launch 6 runs concurrently with fixed settings: IMF, emb=384, layer=12, max_steps=50k.

  • Keep all other relevant training hyperparameters aligned to the current strong baseline unless a concrete blocker appears.

  • Assign one GPU per run on the 8xL20 host.

  • Capture PID, log path, and SwanLab URL for each run in status.json.

Task 4: Monitor and summarize Phase-1 until all 6 finish

Files:

  • Update: experiment_suites/2026-04-04-imf-horizon-grid/status.json

  • Update: experiment_suites/2026-04-04-imf-horizon-grid/notes.md

  • Periodically parse each runs log/checkpoints to extract latest step, latest rollout reward, and best rollout reward so far.

  • Keep a resumable local summary so progress can be continued in later turns without rediscovery.

  • After all 6 runs finish, rank them by max avg_reward and write a compact Phase-1 summary.

Task 5: Prepare the Phase-2 visual-attnres ablation

Files:

  • Likely modify: vision backbone implementation and config files (to be confirmed after code inspection)

  • Add/update targeted tests for the visual backbone path if code changes are needed

  • Use the best Phase-1 (pred_horizon, num_action_steps) combination as the fixed rollout setting for Phase-2.

  • Compare:

    1. current setup: attnres only in the IMF head
    2. ablation setup: attnres in both IMF head and visual encoder path
  • Keep the rest of the training settings fixed.

  • Launch and monitor the Phase-2 pair after Phase-1 summary is complete.