2.9 KiB
PushT iMF Full-Attention Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a separate full-attention PushT image iMF config, commit/push it on a new branch, and launch the 9-run 350-epoch architecture sweep across 3 GPUs.
Architecture: Keep the existing causal iMF path untouched and add a standalone full-attention config that only flips policy.causal_attn=false while retaining one-step iMF inference and SwanLab-safe naming. Reuse the previous 9-run architecture matrix and balanced 3-queue scheduling across local 5090 plus 5880 GPU0/GPU1.
Tech Stack: Hydra, Diffusion Policy iMF image workspace, SwanLab, uv env, local shell + trusted remote 5880 over SSH.
Task 1: Add full-attention iMF config with TDD
Files:
-
Create:
image_pusht_diffusion_policy_dit_imf_fullattn.yaml -
Modify:
tests/test_pusht_swanlab_config.py -
Write a failing config regression test asserting the new config uses SwanLab-safe naming and
policy.causal_attn == False. -
Run the targeted pytest command and verify it fails because the config does not exist yet.
-
Add the minimal full-attention config by composing from the existing PushT image iMF config and overriding only
exp_nameandpolicy.causal_attn=false. -
Re-run the targeted pytest and verify it passes.
Task 2: Verify the new config
Files:
-
Read:
image_pusht_diffusion_policy_dit_imf_fullattn.yaml -
Run
train.py --helpfor the new config. -
Run a real
training.debug=truesmoke test locally to confirm the training path is valid.
Task 3: Commit and push the new branch
Files:
-
Commit only the new config/test/plan files needed for the full-attention experiment chain.
-
Run verification commands again before commit.
-
Commit with a focused message.
-
Push
feat/pusht-imf-fullattnto origin.
Task 4: Launch the 9-run sweep
Files:
-
Write queue scripts and logs under
data/run_logs/locally and on 5880. -
Write outputs under
data/outputs/locally and on 5880. -
Use the same matrix as the prior iMF sweep:
n_emb ∈ {128,256,384},n_layer ∈ {6,12,18},seed=42. -
Set
training.num_epochs=350for all 9 runs. -
Encode
fullattnin everyexp_name,logging.name, and run directory to avoid collisions. -
Balance the 9 runs across local 5090, 5880 GPU0, and 5880 GPU1 as three serial queues.
-
Sync the new config to the remote smoke repo before launching remote queues.
Task 5: Monitor and auto-summarize
Files:
-
Read local and remote pid files, logs, outputs, checkpoints.
-
Start an xhigh monitoring agent that polls all three queues.
-
On completion, parse all 9
logs.json.txtfiles and rank by maxtest_mean_score. -
Report embedding/layer trends and the best configuration.