# PushT Image iMF AttnRes Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add an AttnRes-backed full-attention iMF backbone for the PushT image experiment path, verify it with tests/smoke runs, then launch the 9-run 350-epoch architecture sweep across the local 5090 and remote 5880 GPUs. **Architecture:** Extend `IMFTransformerForDiffusion` with a selectable `attnres_full` backbone that keeps the current iMF training/inference API unchanged while replacing the transformer internals with RMSNorm + RoPE self-attention + SwiGLU + Full AttnRes depth-wise residual routing. Add one standalone Hydra config for the PushT image sweep and reuse queue-style launch scripts with unique SwanLab names. **Tech Stack:** Python 3.9 via uv, PyTorch 2.8 CUDA, Hydra, SwanLab online logging, local shell + SSH to trusted 5880 host. --- ### Task 1: Add regression tests for the new AttnRes path **Files:** - Modify: `tests/test_imf_transformer_for_diffusion.py` - Modify: `tests/test_pusht_swanlab_config.py` - [ ] Add a failing model test that instantiates `IMFTransformerForDiffusion(backbone_type='attnres_full', causal_attn=False, ...)`, runs a forward pass with conditional observations, and asserts output shape plus optimizer construction. - [ ] Run the targeted pytest selection and confirm the new test fails for the expected missing-backbone reason. - [ ] Add a failing config regression test for `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml` asserting SwanLab naming fields and `policy.causal_attn == False`. - [ ] Re-run the targeted pytest selection and confirm the config test fails before implementation. ### Task 2: Implement the AttnRes-backed iMF backbone **Files:** - Create: `diffusion_policy/model/diffusion/attnres_transformer_components.py` - Modify: `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py` - [ ] Add focused reusable modules for `RMSNorm`, RoPE helpers, grouped-query self-attention, SwiGLU FFN, and the Full AttnRes operator. - [ ] Extend `IMFTransformerForDiffusion` with a `backbone_type` switch that preserves the existing vanilla path and adds an `attnres_full` path using concatenated `[r, t, obs, sample]` tokens. - [ ] Ensure the AttnRes path slices condition tokens away before the output head so the returned tensor still matches the sample/action horizon. - [ ] Update optimizer parameter grouping to treat RMSNorm weights like LayerNorm weights (no decay) and include any new positional/conditioning parameters. - [ ] Run the targeted tests and get them green. ### Task 3: Add the new PushT config and smoke-test path **Files:** - Create: `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml` - Modify: `tests/test_pusht_swanlab_config.py` - [ ] Add a standalone PushT image config for the AttnRes iMF variant with SwanLab online logging, `policy.backbone_type=attnres_full`, and `policy.causal_attn=false`. - [ ] Verify `uv run python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_dit_imf_attnres_full.yaml --help` succeeds. - [ ] Run a real smoke training command with `training.debug=true`, `training.device=cuda:0`, safety overrides (`dataloader.num_workers=0`, `task.env_runner.n_envs=1`, no vis), and confirm it reaches the training loop and writes a run directory. ### Task 4: Prepare launch scripts and start the 9-run sweep **Files:** - Create or modify: `data/run_logs/imf_attnres_local_queue.sh` - Create or modify locally before copy: `data/run_logs/imf_attnres_remote_gpu0_queue.sh` - Create or modify locally before copy: `data/run_logs/imf_attnres_remote_gpu1_queue.sh` - [ ] Write queue command templates for the 9 runs using config `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml`, `training.num_epochs=350`, unique `exp_name/logging.name`, and shared `logging.group=imf_pusht_attnres_arch_sweep`. - [ ] Sync the necessary config/model files plus remote queue scripts to `droid@100.73.14.65:~/project/diffusion_policy-smoke`. - [ ] Start the local queue under `nohup`, record PID, and verify the first run log is advancing. - [ ] Start the two remote queues under `nohup`, record PIDs, and verify both first-run logs are advancing. - [ ] Confirm all three GPUs have officially entered training for the new sweep.