269 lines
11 KiB
Markdown
269 lines
11 KiB
Markdown
# IMF-AttnRes Policy Migration Implementation Plan
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** 将 external `diffusion_policy@185ed659` 的 IMF-AttnRes 模型、训练目标和一步推理机制迁移到 RoboIMI,并在保持三相机视觉条件输入与现有训练/rollout 工作流的前提下启动同参数训练。
|
||
|
||
**Architecture:** 保留 RoboIMI 现有 ResNet 三相机观测编码、normalization、queue-based online rollout 和训练脚本;新增 AttnRes 组件与 IMF transformer head,并新增 IMF 专用 agent 以覆盖 DDPM loss / DDIM inference 语义。训练脚本只做最小接线修改,让新 head/agent 能用现有 optimizer、checkpoint、SwanLab 和 headless rollout。
|
||
|
||
**Tech Stack:** PyTorch, Hydra, diffusers schedulers (仅保留兼容初始化), MuJoCo rollout, unittest, SwanLab
|
||
|
||
---
|
||
|
||
## File Map
|
||
|
||
### New files
|
||
- `roboimi/vla/models/heads/attnres_transformer_components.py` — 本地 IMF AttnRes 基础组件
|
||
- `roboimi/vla/models/heads/imf_transformer1d.py` — IMF transformer head,暴露 `forward(sample, r, t, cond=None)`
|
||
- `roboimi/vla/agent_imf.py` — IMF 专用 VLA agent,复用现有观测/队列/normalization 逻辑并覆盖 loss / inference
|
||
- `roboimi/vla/conf/head/imf_transformer1d.yaml` — IMF head 配置
|
||
- `roboimi/vla/conf/agent/resnet_imf_attnres.yaml` — IMF agent + backbone/head 组合配置
|
||
- `tests/test_imf_transformer1d_external_alignment.py` — external `185ed659` 对齐测试
|
||
- `tests/test_imf_vla_agent.py` — IMF agent 的 loss / inference / queue 语义测试
|
||
|
||
### Modified files
|
||
- `roboimi/demos/vla_scripts/train_vla.py` — 优化器参数分组接线;确保新 agent 能无缝训练
|
||
- `roboimi/vla/conf/config.yaml` — 保持默认配置不变,仅支持通过 override 启用 IMF agent
|
||
- `tests/test_train_vla_transformer_optimizer.py` — 覆盖 IMF head 的 optimizer-group 行为
|
||
- (如需要)`roboimi/vla/models/heads/__init__.py` 或相近导出文件 — 暴露新 head
|
||
|
||
---
|
||
|
||
### Task 1: 写 IMF transformer 对齐测试
|
||
|
||
**Files:**
|
||
- Create: `tests/test_imf_transformer1d_external_alignment.py`
|
||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/attnres_transformer_components.py`
|
||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
|
||
|
||
- [ ] **Step 1: 写失败测试,验证 local IMF head 与 external `185ed659` 的 state-dict key、前向 shape、forward 数值、optim groups 对齐**
|
||
|
||
```python
|
||
with torch.no_grad():
|
||
external_out = external_model(sample=sample, r=r, t=t, cond=cond)
|
||
local_out = local_model(sample=sample, r=r, t=t, cond=cond)
|
||
assert torch.allclose(local_out, external_out, atol=1e-6, rtol=1e-5)
|
||
```
|
||
|
||
- [ ] **Step 2: 运行单测,确认当前失败**
|
||
|
||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||
Expected: FAIL,提示 `imf_transformer1d` / `attnres` 模块不存在
|
||
|
||
- [ ] **Step 3: 若测试需要复用现有 external-loader 逻辑,则从 `tests/test_transformer1d_external_alignment.py` 复制最小必要 helper,避免重复依赖 session context**
|
||
|
||
- [ ] **Step 4: 提交测试骨架**
|
||
|
||
```bash
|
||
git add tests/test_imf_transformer1d_external_alignment.py
|
||
git commit -m "test: add IMF transformer external alignment coverage"
|
||
```
|
||
|
||
### Task 2: 实现 AttnRes 组件与 IMF transformer head
|
||
|
||
**Files:**
|
||
- Create: `roboimi/vla/models/heads/attnres_transformer_components.py`
|
||
- Create: `roboimi/vla/models/heads/imf_transformer1d.py`
|
||
- Modify: `tests/test_imf_transformer1d_external_alignment.py`
|
||
|
||
- [ ] **Step 1: 按 external `185ed659` 迁移 AttnRes 基础组件,保持命名和参数语义一致**
|
||
|
||
必须包含:
|
||
- `RMSNorm`
|
||
- `RMSNormNoWeight`
|
||
- `precompute_rope_freqs`
|
||
- `apply_rope`
|
||
- `GroupedQuerySelfAttention`
|
||
- `SwiGLUFFN`
|
||
- `AttnResOperator`
|
||
- `AttnResSubLayer`
|
||
- `AttnResTransformerBackbone`
|
||
|
||
- [ ] **Step 2: 在 `imf_transformer1d.py` 中实现本地 IMF head**
|
||
|
||
必须满足:
|
||
- `forward(sample, r, t, cond=None)`
|
||
- 默认支持 `backbone_type='attnres_full'`
|
||
- token 序列为 `[r_token, t_token, cond_tokens..., sample_tokens...]`
|
||
- 输出只切回 sample token 段
|
||
- 保留 `get_optim_groups()` 供 AdamW 分组
|
||
|
||
- [ ] **Step 3: 运行对齐测试,修正 state-dict key / init / no-decay 参数分组不一致问题**
|
||
|
||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||
Expected: PASS
|
||
|
||
- [ ] **Step 4: 提交模型组件实现**
|
||
|
||
```bash
|
||
git add roboimi/vla/models/heads/attnres_transformer_components.py \
|
||
roboimi/vla/models/heads/imf_transformer1d.py \
|
||
tests/test_imf_transformer1d_external_alignment.py
|
||
git commit -m "feat: add IMF AttnRes transformer head"
|
||
```
|
||
|
||
### Task 3: 写 IMF agent 行为测试
|
||
|
||
**Files:**
|
||
- Create: `tests/test_imf_vla_agent.py`
|
||
- Reference: `roboimi/vla/agent.py`
|
||
- Reference: `tests/test_resnet_transformer_agent_wiring.py`
|
||
|
||
- [ ] **Step 1: 写失败测试,覆盖 IMF agent 的核心契约**
|
||
|
||
需要覆盖:
|
||
1. `compute_loss()` 接受当前 batch 结构并返回标量 loss
|
||
2. `predict_action()` 输出 `(B, pred_horizon, action_dim)`
|
||
3. `select_action()` 仍按 queue/chunk 语义工作
|
||
4. `predict_action()` 不走 DDIM 多步循环,而是只触发一步 IMF sample
|
||
5. `action_is_pad` 存在时仅在有效 action 上计 loss
|
||
|
||
- [ ] **Step 2: 用 stub backbone / stub head 记录调用参数,验证 `r,t,cond` 的传递与 observation conditioning 维度正确**
|
||
|
||
```python
|
||
self.assertEqual(recorded['cond'].shape, (B, obs_horizon, expected_cond_dim))
|
||
self.assertTrue(torch.allclose(recorded['r'], torch.zeros(B)))
|
||
self.assertTrue(torch.allclose(recorded['t'], torch.ones(B)))
|
||
```
|
||
|
||
- [ ] **Step 3: 运行测试,确认当前失败**
|
||
|
||
Run: `python -m unittest tests.test_imf_vla_agent -v`
|
||
Expected: FAIL,提示 `roboimi.vla.agent_imf` 不存在
|
||
|
||
- [ ] **Step 4: 提交测试骨架**
|
||
|
||
```bash
|
||
git add tests/test_imf_vla_agent.py
|
||
git commit -m "test: add IMF VLA agent behavior coverage"
|
||
```
|
||
|
||
### Task 4: 实现 IMF agent 与 Hydra 接线
|
||
|
||
**Files:**
|
||
- Create: `roboimi/vla/agent_imf.py`
|
||
- Create: `roboimi/vla/conf/head/imf_transformer1d.yaml`
|
||
- Create: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`
|
||
- Modify: `roboimi/demos/vla_scripts/train_vla.py`
|
||
- Modify: `tests/test_train_vla_transformer_optimizer.py`
|
||
- Modify: `tests/test_imf_vla_agent.py`
|
||
|
||
- [ ] **Step 1: 以 `VLAAgent` 为基础实现 `IMFVLAAgent`**
|
||
|
||
实现策略:
|
||
- 复用 `VLAAgent.__init__`、`_build_cond()`、`reset()`、`_populate_queues()`、`_prepare_observation_batch()`、`select_action()`、`get_normalization_stats()`
|
||
- 覆盖:
|
||
- `compute_loss()` -> IMF objective
|
||
- `predict_action()` -> one-step sample
|
||
- 提供内部 helper:
|
||
- `_broadcast_batch_time`
|
||
- `_apply_conditioning`(如需)
|
||
- `_compute_u_and_du_dt`
|
||
- `_compound_velocity`
|
||
- `_sample_one_step`
|
||
|
||
- [ ] **Step 2: 在 JVP 路径中加入 CUDA math SDPA fallback,保持 external repo 的稳定性策略**
|
||
|
||
- [ ] **Step 3: 新增 Hydra 配置,让 `agent=resnet_imf_attnres` 可实例化**
|
||
|
||
关键默认值:
|
||
- `_target_: roboimi.vla.agent_imf.IMFVLAAgent`
|
||
- `head._target_: roboimi.vla.models.heads.imf_transformer1d.IMFTransformer1D`
|
||
- `head.backbone_type: attnres_full`
|
||
- `head.causal_attn: false`
|
||
- `head.time_as_cond: true`
|
||
- `head.n_cond_layers: 0`
|
||
- `inference_steps: 1`
|
||
- `camera_names: ${data.camera_names}`
|
||
- `vision_backbone.camera_names: ${agent.camera_names}`
|
||
|
||
- [ ] **Step 4: 让训练脚本对任何带 `get_optim_groups()` 的 head 复用参数分组,而不是硬编码旧 transformer head_type**
|
||
|
||
推荐最小改法:
|
||
```python
|
||
use_head_groups = callable(getattr(noise_pred_net, 'get_optim_groups', None))
|
||
```
|
||
|
||
- [ ] **Step 5: 运行测试并修复 wiring 问题**
|
||
|
||
Run:
|
||
- `python -m unittest tests.test_imf_vla_agent -v`
|
||
- `python -m unittest tests.test_train_vla_transformer_optimizer -v`
|
||
|
||
Expected: PASS
|
||
|
||
- [ ] **Step 6: 提交 agent / config / train-script 接线**
|
||
|
||
```bash
|
||
git add roboimi/vla/agent_imf.py \
|
||
roboimi/vla/conf/head/imf_transformer1d.yaml \
|
||
roboimi/vla/conf/agent/resnet_imf_attnres.yaml \
|
||
roboimi/demos/vla_scripts/train_vla.py \
|
||
tests/test_imf_vla_agent.py \
|
||
tests/test_train_vla_transformer_optimizer.py
|
||
git commit -m "feat: add IMF VLA agent and training wiring"
|
||
```
|
||
|
||
### Task 5: 集成验证与训练启动
|
||
|
||
**Files:**
|
||
- Modify: none required unless验证暴露真实问题
|
||
- Use run artifacts under: `runs/`
|
||
|
||
- [ ] **Step 1: 运行聚焦测试集**
|
||
|
||
Run:
|
||
```bash
|
||
python -m unittest \
|
||
tests.test_imf_transformer1d_external_alignment \
|
||
tests.test_imf_vla_agent \
|
||
tests.test_resnet_transformer_agent_wiring \
|
||
tests.test_train_vla_transformer_optimizer -v
|
||
```
|
||
Expected: PASS
|
||
|
||
- [ ] **Step 2: 运行一个最小 GPU 训练冒烟任务(不必长跑)**
|
||
|
||
Run:
|
||
```bash
|
||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||
agent=resnet_imf_attnres \
|
||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||
data.camera_names=[r_vis,top,front] \
|
||
train.device=cuda train.max_steps=2 train.batch_size=4 train.num_workers=2 \
|
||
train.use_swanlab=false train.rollout_val_freq_epochs=0
|
||
```
|
||
Expected: 成功完成 2 steps,生成 checkpoint / log,无 shape 或 JVP 错误
|
||
|
||
- [ ] **Step 3: 用正式参数启动 IMF 训练**
|
||
|
||
Run:
|
||
```bash
|
||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||
agent=resnet_imf_attnres \
|
||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||
data.camera_names=[r_vis,top,front] \
|
||
train.device=cuda train.val_split=0.0 train.seed=42 \
|
||
train.batch_size=80 train.lr=5e-4 train.num_workers=12 train.max_steps=150000 \
|
||
train.log_freq=100 train.save_freq=10000 train.use_swanlab=true \
|
||
train.swanlab_project=roboimi-vla \
|
||
train.rollout_val_freq_epochs=5 train.rollout_validate_on_checkpoint=false \
|
||
train.rollout_num_episodes=5 train.warmup_steps=2000 \
|
||
train.scheduler_type=cosine train.min_lr=1e-6 train.weight_decay=1e-5 train.grad_clip=1.0 \
|
||
agent.pred_horizon=16 agent.inference_steps=1 \
|
||
agent.head.n_emb=384 agent.head.n_layer=18 agent.head.n_head=1 agent.head.n_kv_head=1 \
|
||
agent.vision_backbone.pretrained_backbone_weights=null \
|
||
agent.vision_backbone.freeze_backbone=false \
|
||
agent.vision_backbone.use_separate_rgb_encoder_per_camera=true
|
||
```
|
||
Expected: 训练启动成功,SwanLab 记录完整 config,5 epoch 一次 headless rollout
|
||
|
||
- [ ] **Step 4: 记录 run 路径、训练 PID、SwanLab 运行名并向用户汇报**
|
||
|
||
- [ ] **Step 5: 提交最终收尾改动(如果 smoke fix 需要额外 patch)**
|
||
|
||
```bash
|
||
git add <changed files>
|
||
git commit -m "chore: verify IMF AttnRes training launch"
|
||
```
|