diff --git a/docs/superpowers/plans/2026-03-27-pusht-imf-fullattn-implementation.md b/docs/superpowers/plans/2026-03-27-pusht-imf-fullattn-implementation.md new file mode 100644 index 0000000..4baf80d --- /dev/null +++ b/docs/superpowers/plans/2026-03-27-pusht-imf-fullattn-implementation.md @@ -0,0 +1,60 @@ +# PushT iMF Full-Attention Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a separate full-attention PushT image iMF config, commit/push it on a new branch, and launch the 9-run 350-epoch architecture sweep across 3 GPUs. + +**Architecture:** Keep the existing causal iMF path untouched and add a standalone full-attention config that only flips `policy.causal_attn=false` while retaining one-step iMF inference and SwanLab-safe naming. Reuse the previous 9-run architecture matrix and balanced 3-queue scheduling across local 5090 plus 5880 GPU0/GPU1. + +**Tech Stack:** Hydra, Diffusion Policy iMF image workspace, SwanLab, uv env, local shell + trusted remote 5880 over SSH. + +--- + +### Task 1: Add full-attention iMF config with TDD + +**Files:** +- Create: `image_pusht_diffusion_policy_dit_imf_fullattn.yaml` +- Modify: `tests/test_pusht_swanlab_config.py` + +- [ ] Write a failing config regression test asserting the new config uses SwanLab-safe naming and `policy.causal_attn == False`. +- [ ] Run the targeted pytest command and verify it fails because the config does not exist yet. +- [ ] Add the minimal full-attention config by composing from the existing PushT image iMF config and overriding only `exp_name` and `policy.causal_attn=false`. +- [ ] Re-run the targeted pytest and verify it passes. + +### Task 2: Verify the new config + +**Files:** +- Read: `image_pusht_diffusion_policy_dit_imf_fullattn.yaml` + +- [ ] Run `train.py --help` for the new config. +- [ ] Run a real `training.debug=true` smoke test locally to confirm the training path is valid. + +### Task 3: Commit and push the new branch + +**Files:** +- Commit only the new config/test/plan files needed for the full-attention experiment chain. + +- [ ] Run verification commands again before commit. +- [ ] Commit with a focused message. +- [ ] Push `feat/pusht-imf-fullattn` to origin. + +### Task 4: Launch the 9-run sweep + +**Files:** +- Write queue scripts and logs under `data/run_logs/` locally and on 5880. +- Write outputs under `data/outputs/` locally and on 5880. + +- [ ] Use the same matrix as the prior iMF sweep: `n_emb ∈ {128,256,384}`, `n_layer ∈ {6,12,18}`, `seed=42`. +- [ ] Set `training.num_epochs=350` for all 9 runs. +- [ ] Encode `fullattn` in every `exp_name`, `logging.name`, and run directory to avoid collisions. +- [ ] Balance the 9 runs across local 5090, 5880 GPU0, and 5880 GPU1 as three serial queues. +- [ ] Sync the new config to the remote smoke repo before launching remote queues. + +### Task 5: Monitor and auto-summarize + +**Files:** +- Read local and remote pid files, logs, outputs, checkpoints. + +- [ ] Start an xhigh monitoring agent that polls all three queues. +- [ ] On completion, parse all 9 `logs.json.txt` files and rank by max `test_mean_score`. +- [ ] Report embedding/layer trends and the best configuration. diff --git a/image_pusht_diffusion_policy_dit_imf_fullattn.yaml b/image_pusht_diffusion_policy_dit_imf_fullattn.yaml new file mode 100644 index 0000000..af4dd22 --- /dev/null +++ b/image_pusht_diffusion_policy_dit_imf_fullattn.yaml @@ -0,0 +1,33 @@ +defaults: + - diffusion_policy/config/train_diffusion_transformer_hybrid_workspace@_here_ + - override /diffusion_policy/config/task@task: pusht_image + - _self_ + +exp_name: pusht_image_dit_imf_fullattn + +policy: + _target_: diffusion_policy.policy.imf_transformer_hybrid_image_policy.IMFTransformerHybridImagePolicy + num_inference_steps: 1 + n_head: 1 + causal_attn: false + +logging: + backend: swanlab + mode: online + name: ${exp_name} + resume: false + tags: ["${name}", "${task_name}", "${exp_name}", "swanlab"] + id: null + group: ${exp_name} + +dataloader: + num_workers: 0 + +val_dataloader: + num_workers: 0 + +task: + env_runner: + n_envs: 1 + n_test_vis: 0 + n_train_vis: 0 diff --git a/tests/test_pusht_swanlab_config.py b/tests/test_pusht_swanlab_config.py index 130b11c..d008345 100644 --- a/tests/test_pusht_swanlab_config.py +++ b/tests/test_pusht_swanlab_config.py @@ -30,3 +30,15 @@ def test_image_pusht_dit_imf_swanlab_config_uses_exp_name_and_no_resume_collisio assert cfg.logging.resume is False assert cfg.logging.id is None assert cfg.logging.group == cfg.exp_name + + +def test_image_pusht_dit_imf_fullattn_config_uses_exp_name_and_disables_causal_attention(): + cfg = _load_cfg('image_pusht_diffusion_policy_dit_imf_fullattn.yaml') + + assert cfg.logging.backend == 'swanlab' + assert cfg.logging.mode == 'online' + assert cfg.logging.name == cfg.exp_name + assert cfg.logging.resume is False + assert cfg.logging.id is None + assert cfg.logging.group == cfg.exp_name + assert cfg.policy.causal_attn is False