Explore Help

Logic/diffusion_policy

1

0

You've already forked diffusion_policy

Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity

Files

feat/pusht-imf-attnres

diffusion_policy/docs/superpowers/plans/2026-03-29-pusht-imf-attnres-implementation.md

Logic 185ed6596c feat: add pusht imf attnres backbone

2026-03-29 11:15:59 +08:00

4.3 KiB

Raw Permalink Blame History

PushT Image iMF AttnRes Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add an AttnRes-backed full-attention iMF backbone for the PushT image experiment path, verify it with tests/smoke runs, then launch the 9-run 350-epoch architecture sweep across the local 5090 and remote 5880 GPUs.

Architecture: Extend IMFTransformerForDiffusion with a selectable attnres_full backbone that keeps the current iMF training/inference API unchanged while replacing the transformer internals with RMSNorm + RoPE self-attention + SwiGLU + Full AttnRes depth-wise residual routing. Add one standalone Hydra config for the PushT image sweep and reuse queue-style launch scripts with unique SwanLab names.

Tech Stack: Python 3.9 via uv, PyTorch 2.8 CUDA, Hydra, SwanLab online logging, local shell + SSH to trusted 5880 host.

Task 1: Add regression tests for the new AttnRes path

Files:

Modify: tests/test_imf_transformer_for_diffusion.py
Modify: tests/test_pusht_swanlab_config.py
Add a failing model test that instantiates IMFTransformerForDiffusion(backbone_type='attnres_full', causal_attn=False, ...), runs a forward pass with conditional observations, and asserts output shape plus optimizer construction.
Run the targeted pytest selection and confirm the new test fails for the expected missing-backbone reason.
Add a failing config regression test for image_pusht_diffusion_policy_dit_imf_attnres_full.yaml asserting SwanLab naming fields and policy.causal_attn == False.
Re-run the targeted pytest selection and confirm the config test fails before implementation.

Task 2: Implement the AttnRes-backed iMF backbone

Files:

Create: diffusion_policy/model/diffusion/attnres_transformer_components.py
Modify: diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py
Add focused reusable modules for RMSNorm, RoPE helpers, grouped-query self-attention, SwiGLU FFN, and the Full AttnRes operator.
Extend IMFTransformerForDiffusion with a backbone_type switch that preserves the existing vanilla path and adds an attnres_full path using concatenated [r, t, obs, sample] tokens.
Ensure the AttnRes path slices condition tokens away before the output head so the returned tensor still matches the sample/action horizon.
Update optimizer parameter grouping to treat RMSNorm weights like LayerNorm weights (no decay) and include any new positional/conditioning parameters.
Run the targeted tests and get them green.

Task 3: Add the new PushT config and smoke-test path

Files:

Create: image_pusht_diffusion_policy_dit_imf_attnres_full.yaml
Modify: tests/test_pusht_swanlab_config.py
Add a standalone PushT image config for the AttnRes iMF variant with SwanLab online logging, policy.backbone_type=attnres_full, and policy.causal_attn=false.
Verify uv run python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_dit_imf_attnres_full.yaml --help succeeds.
Run a real smoke training command with training.debug=true, training.device=cuda:0, safety overrides (dataloader.num_workers=0, task.env_runner.n_envs=1, no vis), and confirm it reaches the training loop and writes a run directory.

Task 4: Prepare launch scripts and start the 9-run sweep

Files:

Create or modify: data/run_logs/imf_attnres_local_queue.sh
Create or modify locally before copy: data/run_logs/imf_attnres_remote_gpu0_queue.sh
Create or modify locally before copy: data/run_logs/imf_attnres_remote_gpu1_queue.sh
Write queue command templates for the 9 runs using config image_pusht_diffusion_policy_dit_imf_attnres_full.yaml, training.num_epochs=350, unique exp_name/logging.name, and shared logging.group=imf_pusht_attnres_arch_sweep.
Sync the necessary config/model files plus remote queue scripts to droid@100.73.14.65:~/project/diffusion_policy-smoke.
Start the local queue under nohup, record PID, and verify the first run log is advancing.
Start the two remote queues under nohup, record PIDs, and verify both first-run logs are advancing.
Confirm all three GPUs have officially entered training for the new sweep.

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.25.3 Page: 48ms Template: 4ms

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API