feat: add pusht imf attnres backbone

This commit is contained in:
Logic
2026-03-29 11:15:59 +08:00
parent 78ab18e8f3
commit 185ed6596c
8 changed files with 647 additions and 61 deletions

View File

@@ -0,0 +1,57 @@
# PushT Image iMF AttnRes Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add an AttnRes-backed full-attention iMF backbone for the PushT image experiment path, verify it with tests/smoke runs, then launch the 9-run 350-epoch architecture sweep across the local 5090 and remote 5880 GPUs.
**Architecture:** Extend `IMFTransformerForDiffusion` with a selectable `attnres_full` backbone that keeps the current iMF training/inference API unchanged while replacing the transformer internals with RMSNorm + RoPE self-attention + SwiGLU + Full AttnRes depth-wise residual routing. Add one standalone Hydra config for the PushT image sweep and reuse queue-style launch scripts with unique SwanLab names.
**Tech Stack:** Python 3.9 via uv, PyTorch 2.8 CUDA, Hydra, SwanLab online logging, local shell + SSH to trusted 5880 host.
---
### Task 1: Add regression tests for the new AttnRes path
**Files:**
- Modify: `tests/test_imf_transformer_for_diffusion.py`
- Modify: `tests/test_pusht_swanlab_config.py`
- [ ] Add a failing model test that instantiates `IMFTransformerForDiffusion(backbone_type='attnres_full', causal_attn=False, ...)`, runs a forward pass with conditional observations, and asserts output shape plus optimizer construction.
- [ ] Run the targeted pytest selection and confirm the new test fails for the expected missing-backbone reason.
- [ ] Add a failing config regression test for `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml` asserting SwanLab naming fields and `policy.causal_attn == False`.
- [ ] Re-run the targeted pytest selection and confirm the config test fails before implementation.
### Task 2: Implement the AttnRes-backed iMF backbone
**Files:**
- Create: `diffusion_policy/model/diffusion/attnres_transformer_components.py`
- Modify: `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
- [ ] Add focused reusable modules for `RMSNorm`, RoPE helpers, grouped-query self-attention, SwiGLU FFN, and the Full AttnRes operator.
- [ ] Extend `IMFTransformerForDiffusion` with a `backbone_type` switch that preserves the existing vanilla path and adds an `attnres_full` path using concatenated `[r, t, obs, sample]` tokens.
- [ ] Ensure the AttnRes path slices condition tokens away before the output head so the returned tensor still matches the sample/action horizon.
- [ ] Update optimizer parameter grouping to treat RMSNorm weights like LayerNorm weights (no decay) and include any new positional/conditioning parameters.
- [ ] Run the targeted tests and get them green.
### Task 3: Add the new PushT config and smoke-test path
**Files:**
- Create: `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml`
- Modify: `tests/test_pusht_swanlab_config.py`
- [ ] Add a standalone PushT image config for the AttnRes iMF variant with SwanLab online logging, `policy.backbone_type=attnres_full`, and `policy.causal_attn=false`.
- [ ] Verify `uv run python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_dit_imf_attnres_full.yaml --help` succeeds.
- [ ] Run a real smoke training command with `training.debug=true`, `training.device=cuda:0`, safety overrides (`dataloader.num_workers=0`, `task.env_runner.n_envs=1`, no vis), and confirm it reaches the training loop and writes a run directory.
### Task 4: Prepare launch scripts and start the 9-run sweep
**Files:**
- Create or modify: `data/run_logs/imf_attnres_local_queue.sh`
- Create or modify locally before copy: `data/run_logs/imf_attnres_remote_gpu0_queue.sh`
- Create or modify locally before copy: `data/run_logs/imf_attnres_remote_gpu1_queue.sh`
- [ ] Write queue command templates for the 9 runs using config `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml`, `training.num_epochs=350`, unique `exp_name/logging.name`, and shared `logging.group=imf_pusht_attnres_arch_sweep`.
- [ ] Sync the necessary config/model files plus remote queue scripts to `droid@100.73.14.65:~/project/diffusion_policy-smoke`.
- [ ] Start the local queue under `nohup`, record PID, and verify the first run log is advancing.
- [ ] Start the two remote queues under `nohup`, record PIDs, and verify both first-run logs are advancing.
- [ ] Confirm all three GPUs have officially entered training for the new sweep.

View File

@@ -0,0 +1,108 @@
# PushT Image iMF AttnRes Design
## Goal
在现有 PushT 图像 iMF full-attention 路线之上,引入 `attn_res` 仓库中的 **Full AttnRes** 残差聚合形式,并同步使用与其匹配的 **RMSNorm + 自注意力 + SwiGLU FFN** 模块,保持 iMF 训练目标与一步推理语义不变,仅作用于本次实验链路。实现完成并验证后,启动与此前相同的 9 组 `n_emb × n_layer` 扫描350 epochs, seed=42, SwanLab online, 无视频记录)。
## Scope
本次工作仅覆盖:
1.`IMFTransformerForDiffusion` 增加一个 AttnRes-backed backbone 变体;
2. 保持 `forward(sample, r, t, cond=None)`、iMF loss、一步推理策略接口不变
3. 新增独立 PushT 图像配置用于该变体;
4. 复用本地 5090 + 远端 5880 双卡三路并行调度 9 组实验。
不在范围内:
- 不替换已有 vanilla iMF/full-attn 配置;
- 不修改 DiT baseline
- 不增加视频日志;
- 不扩大到多 seed。
## Recommended Approach
采用“**在当前 iMF 模型内增加可选 AttnRes backbone**”的方式,而不是新建独立 policy 链路。
理由:
- policy / workspace / loss / sampling 路径已经被验证,保留这些路径可最大程度缩小变动面;
- 仅在模型内部切换 backbone可以让新实验与既有 iMF 结果保持可比;
- 配置上只需显式打开 `backbone_type=attnres_full``causal_attn=false` 等开关,复现实验更直接。
## Architecture
### 1. Backbone split
`IMFTransformerForDiffusion` 保留现有 vanilla encoder/decoder 实现为默认路径,并新增 `attnres_full` 路径:
- **vanilla**:保持当前实现不变;
- **attnres_full**:使用单栈式全注意力 Transformer输入 token 序列为
`[r token, t token, obs cond tokens..., action/sample tokens...]`
模型只对末尾的 action/sample token 位置输出 `u` 预测,前置条件 token 仅参与上下文建模。
### 2. AttnRes stack
新 backbone 使用以下模块:
- `RMSNorm`
- `Rotary Position Embedding`(用于自注意力 q/k
- `GroupedQueryAttention`(本实验默认 `n_kv_head=1`,与单头配置兼容)
- `SwiGLU` FFN
- `AttnResOperator`(每个子层一个 pseudo-query执行 full depth-wise residual aggregation
每个 transformer block 由两个子层组成:
1. self-attention 子层
2. FFN 子层
每个子层的输入不再是简单 `x + f(x)`,而是从 embedding 与全部历史子层输出中通过 Full AttnRes 聚合得到 `h_l`,再执行 `RMSNorm(h_l) -> sublayer_fn(...)`
### 3. Conditioning and token flow
- `sample` 先经 `input_emb` 映射为 action tokens
- `r``t` 各自经 `SinusoidalPosEmb + linear` 映射为两个条件 token
- 图像观测编码后的 `cond` 通过 `cond_obs_emb` 映射为 obs tokens
- 拼接后的完整 token 序列进入 AttnRes stack
- 输出时切掉前置条件 token仅保留 action/sample token 段,随后经 `RMSNorm + head` 得到最终 `u`
### 4. Attention mode
本次实验链路固定为 **non-causal full attention**
- `causal_attn=false`
- 不构造 causal mask
- 所有 token 可彼此双向可见
这与用户指定的“训练过程仍然使用全注意力(不加因果注意)”一致。
## Config and Logging
新增独立配置文件,例如:
- `image_pusht_diffusion_policy_dit_imf_attnres_full.yaml`
该配置需要:
- 指向现有 `IMFTransformerHybridImagePolicy`
- 显式开启 AttnRes backbone 相关参数
- 设置 `policy.causal_attn=false`
- 保持 `logging.backend=swanlab``logging.mode=online`
- 运行时通过覆盖保证:
- `logging.name=<unique_run_name>`
- `logging.group=imf_pusht_attnres_arch_sweep`
- `exp_name=<unique_run_name>`
- 保持 `task.env_runner.n_test_vis=0``n_train_vis=0`,仅记录标量
## Experiment Matrix
固定 9 组:
- `n_emb ∈ {128, 256, 384}`
- `n_layer ∈ {6, 12, 18}`
- `seed=42`
- `training.num_epochs=350`
## Scheduling
沿用之前验证过的三队列分配:
- 本机 5090`384x18`, `256x6`, `128x6`
- 5880 GPU0`384x12`, `256x12`, `128x12`
- 5880 GPU1`384x6`, `256x18`, `128x18`
每个 run name 编码 backbone 与结构,例如:
`imf_attnres_emb256_layer12_seed42_5880gpu0`
## Verification
实现阶段至少验证:
1. 新配置的 SwanLab 命名与 `causal_attn=false` 正确;
2. 新 backbone 的 forward shape 与 `configure_optimizers()` 可用;
3. 旧 vanilla 路径测试不回归;
4. `training.debug=true` smoke run 可以完整通过。
## Success Criteria
1. 新 AttnRes iMF 变体在本分支可训练、可一步推理;
2. 不影响已有 vanilla iMF/full-attn 链路;
3. 9 组实验成功在三张卡上正式启动;
4. SwanLab run 名称唯一,无冲突;
5. 不记录视频,仅记录标量。