Files

Logic 8a8193fe7e docs: add IMF-AttnRes migration implementation plan

2026-04-01 22:52:39 +08:00

11 KiB

Raw Permalink Blame History

IMF-AttnRes Policy Migration Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 将 external diffusion_policy@185ed659 的 IMF-AttnRes 模型、训练目标和一步推理机制迁移到 RoboIMI，并在保持三相机视觉条件输入与现有训练/rollout 工作流的前提下启动同参数训练。

Architecture: 保留 RoboIMI 现有 ResNet 三相机观测编码、normalization、queue-based online rollout 和训练脚本；新增 AttnRes 组件与 IMF transformer head，并新增 IMF 专用 agent 以覆盖 DDPM loss / DDIM inference 语义。训练脚本只做最小接线修改，让新 head/agent 能用现有 optimizer、checkpoint、SwanLab 和 headless rollout。

Tech Stack: PyTorch, Hydra, diffusers schedulers (仅保留兼容初始化), MuJoCo rollout, unittest, SwanLab

File Map

New files

roboimi/vla/models/heads/attnres_transformer_components.py — 本地 IMF AttnRes 基础组件
roboimi/vla/models/heads/imf_transformer1d.py — IMF transformer head，暴露 forward(sample, r, t, cond=None)
roboimi/vla/agent_imf.py — IMF 专用 VLA agent，复用现有观测/队列/normalization 逻辑并覆盖 loss / inference
roboimi/vla/conf/head/imf_transformer1d.yaml — IMF head 配置
roboimi/vla/conf/agent/resnet_imf_attnres.yaml — IMF agent + backbone/head 组合配置
tests/test_imf_transformer1d_external_alignment.py — external 185ed659 对齐测试
tests/test_imf_vla_agent.py — IMF agent 的 loss / inference / queue 语义测试

Modified files

roboimi/demos/vla_scripts/train_vla.py — 优化器参数分组接线；确保新 agent 能无缝训练
roboimi/vla/conf/config.yaml — 保持默认配置不变，仅支持通过 override 启用 IMF agent
tests/test_train_vla_transformer_optimizer.py — 覆盖 IMF head 的 optimizer-group 行为
（如需要）roboimi/vla/models/heads/__init__.py 或相近导出文件 — 暴露新 head

Task 1: 写 IMF transformer 对齐测试

Files:

Create: tests/test_imf_transformer1d_external_alignment.py
Reference: /home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/attnres_transformer_components.py
Reference: /home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py
Step 1: 写失败测试，验证 local IMF head 与 external 185ed659 的 state-dict key、前向 shape、forward 数值、optim groups 对齐

with torch.no_grad():
    external_out = external_model(sample=sample, r=r, t=t, cond=cond)
    local_out = local_model(sample=sample, r=r, t=t, cond=cond)
assert torch.allclose(local_out, external_out, atol=1e-6, rtol=1e-5)

Step 2: 运行单测，确认当前失败

Run: python -m unittest tests.test_imf_transformer1d_external_alignment -v Expected: FAIL，提示 imf_transformer1d / attnres 模块不存在

Step 3: 若测试需要复用现有 external-loader 逻辑，则从 tests/test_transformer1d_external_alignment.py 复制最小必要 helper，避免重复依赖 session context
Step 4: 提交测试骨架

git add tests/test_imf_transformer1d_external_alignment.py
git commit -m "test: add IMF transformer external alignment coverage"

Task 2: 实现 AttnRes 组件与 IMF transformer head

Files:

Create: roboimi/vla/models/heads/attnres_transformer_components.py
Create: roboimi/vla/models/heads/imf_transformer1d.py
Modify: tests/test_imf_transformer1d_external_alignment.py
Step 1: 按 external 185ed659 迁移 AttnRes 基础组件，保持命名和参数语义一致

必须包含：

RMSNorm
RMSNormNoWeight
precompute_rope_freqs
apply_rope
GroupedQuerySelfAttention
SwiGLUFFN
AttnResOperator
AttnResSubLayer
AttnResTransformerBackbone
Step 2: 在 imf_transformer1d.py 中实现本地 IMF head

必须满足：

forward(sample, r, t, cond=None)
默认支持 backbone_type='attnres_full'
token 序列为 [r_token, t_token, cond_tokens..., sample_tokens...]
输出只切回 sample token 段
保留 get_optim_groups() 供 AdamW 分组
Step 3: 运行对齐测试，修正 state-dict key / init / no-decay 参数分组不一致问题

Run: python -m unittest tests.test_imf_transformer1d_external_alignment -v Expected: PASS

Step 4: 提交模型组件实现

git add roboimi/vla/models/heads/attnres_transformer_components.py \
        roboimi/vla/models/heads/imf_transformer1d.py \
        tests/test_imf_transformer1d_external_alignment.py
git commit -m "feat: add IMF AttnRes transformer head"

Task 3: 写 IMF agent 行为测试

Files:

Create: tests/test_imf_vla_agent.py
Reference: roboimi/vla/agent.py
Reference: tests/test_resnet_transformer_agent_wiring.py
Step 1: 写失败测试，覆盖 IMF agent 的核心契约

需要覆盖：

compute_loss() 接受当前 batch 结构并返回标量 loss
predict_action() 输出 (B, pred_horizon, action_dim)
select_action() 仍按 queue/chunk 语义工作
predict_action() 不走 DDIM 多步循环，而是只触发一步 IMF sample
action_is_pad 存在时仅在有效 action 上计 loss

Step 2: 用 stub backbone / stub head 记录调用参数，验证 r,t,cond 的传递与 observation conditioning 维度正确

self.assertEqual(recorded['cond'].shape, (B, obs_horizon, expected_cond_dim))
self.assertTrue(torch.allclose(recorded['r'], torch.zeros(B)))
self.assertTrue(torch.allclose(recorded['t'], torch.ones(B)))

Step 3: 运行测试，确认当前失败

Run: python -m unittest tests.test_imf_vla_agent -v Expected: FAIL，提示 roboimi.vla.agent_imf 不存在

Step 4: 提交测试骨架

git add tests/test_imf_vla_agent.py
git commit -m "test: add IMF VLA agent behavior coverage"

Task 4: 实现 IMF agent 与 Hydra 接线

Files:

Create: roboimi/vla/agent_imf.py
Create: roboimi/vla/conf/head/imf_transformer1d.yaml
Create: roboimi/vla/conf/agent/resnet_imf_attnres.yaml
Modify: roboimi/demos/vla_scripts/train_vla.py
Modify: tests/test_train_vla_transformer_optimizer.py
Modify: tests/test_imf_vla_agent.py
Step 1: 以 VLAAgent 为基础实现 IMFVLAAgent

实现策略：

复用 VLAAgent.__init__、_build_cond()、reset()、_populate_queues()、_prepare_observation_batch()、select_action()、get_normalization_stats()
覆盖：
- compute_loss() -> IMF objective
- predict_action() -> one-step sample
提供内部 helper：
- _broadcast_batch_time
- _apply_conditioning（如需）
- _compute_u_and_du_dt
- _compound_velocity
- _sample_one_step
Step 2: 在 JVP 路径中加入 CUDA math SDPA fallback，保持 external repo 的稳定性策略
Step 3: 新增 Hydra 配置，让 agent=resnet_imf_attnres 可实例化

关键默认值：

_target_: roboimi.vla.agent_imf.IMFVLAAgent
head._target_: roboimi.vla.models.heads.imf_transformer1d.IMFTransformer1D
head.backbone_type: attnres_full
head.causal_attn: false
head.time_as_cond: true
head.n_cond_layers: 0
inference_steps: 1
camera_names: ${data.camera_names}
vision_backbone.camera_names: ${agent.camera_names}
Step 4: 让训练脚本对任何带 get_optim_groups() 的 head 复用参数分组，而不是硬编码旧 transformer head_type

推荐最小改法：

use_head_groups = callable(getattr(noise_pred_net, 'get_optim_groups', None))

Step 5: 运行测试并修复 wiring 问题

Run:

python -m unittest tests.test_imf_vla_agent -v
python -m unittest tests.test_train_vla_transformer_optimizer -v

Expected: PASS

Step 6: 提交 agent / config / train-script 接线

git add roboimi/vla/agent_imf.py \
        roboimi/vla/conf/head/imf_transformer1d.yaml \
        roboimi/vla/conf/agent/resnet_imf_attnres.yaml \
        roboimi/demos/vla_scripts/train_vla.py \
        tests/test_imf_vla_agent.py \
        tests/test_train_vla_transformer_optimizer.py
git commit -m "feat: add IMF VLA agent and training wiring"

Task 5: 集成验证与训练启动

Files:

Modify: none required unless验证暴露真实问题
Use run artifacts under: runs/
Step 1: 运行聚焦测试集

Run:

python -m unittest \
  tests.test_imf_transformer1d_external_alignment \
  tests.test_imf_vla_agent \
  tests.test_resnet_transformer_agent_wiring \
  tests.test_train_vla_transformer_optimizer -v

Expected: PASS

Step 2: 运行一个最小 GPU 训练冒烟任务（不必长跑）

Run:

/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
  agent=resnet_imf_attnres \
  data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
  data.camera_names=[r_vis,top,front] \
  train.device=cuda train.max_steps=2 train.batch_size=4 train.num_workers=2 \
  train.use_swanlab=false train.rollout_val_freq_epochs=0

Expected: 成功完成 2 steps，生成 checkpoint / log，无 shape 或 JVP 错误

Step 3: 用正式参数启动 IMF 训练

Run:

/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
  agent=resnet_imf_attnres \
  data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
  data.camera_names=[r_vis,top,front] \
  train.device=cuda train.val_split=0.0 train.seed=42 \
  train.batch_size=80 train.lr=5e-4 train.num_workers=12 train.max_steps=150000 \
  train.log_freq=100 train.save_freq=10000 train.use_swanlab=true \
  train.swanlab_project=roboimi-vla \
  train.rollout_val_freq_epochs=5 train.rollout_validate_on_checkpoint=false \
  train.rollout_num_episodes=5 train.warmup_steps=2000 \
  train.scheduler_type=cosine train.min_lr=1e-6 train.weight_decay=1e-5 train.grad_clip=1.0 \
  agent.pred_horizon=16 agent.inference_steps=1 \
  agent.head.n_emb=384 agent.head.n_layer=18 agent.head.n_head=1 agent.head.n_kv_head=1 \
  agent.vision_backbone.pretrained_backbone_weights=null \
  agent.vision_backbone.freeze_backbone=false \
  agent.vision_backbone.use_separate_rgb_encoder_per_camera=true

Expected: 训练启动成功，SwanLab 记录完整 config，5 epoch 一次 headless rollout

Step 4: 记录 run 路径、训练 PID、SwanLab 运行名并向用户汇报
Step 5: 提交最终收尾改动（如果 smoke fix 需要额外 patch）

git add <changed files>
git commit -m "chore: verify IMF AttnRes training launch"

11 KiB Raw Permalink Blame History Unescape Escape

IMF-AttnRes Policy Migration Implementation Plan

File Map

New files

Modified files

Task 1: 写 IMF transformer 对齐测试

Task 2: 实现 AttnRes 组件与 IMF transformer head

Task 3: 写 IMF agent 行为测试

Task 4: 实现 IMF agent 与 Hydra 接线

Task 5: 集成验证与训练启动

11 KiB

Raw Permalink Blame History