Compare commits
4 Commits
feat-align
...
8d6060224a
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8d6060224a | ||
|
|
8a8193fe7e | ||
|
|
1a92c5e8a6 | ||
|
|
b76bcd8b37 |
3
.gitignore
vendored
3
.gitignore
vendored
@@ -126,3 +126,6 @@ GEMINI.md
|
||||
.github/copilot-instructions.md
|
||||
|
||||
.hydra/
|
||||
|
||||
# Local git worktrees
|
||||
.worktrees/
|
||||
|
||||
@@ -0,0 +1,268 @@
|
||||
# IMF-AttnRes Policy Migration Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** 将 external `diffusion_policy@185ed659` 的 IMF-AttnRes 模型、训练目标和一步推理机制迁移到 RoboIMI,并在保持三相机视觉条件输入与现有训练/rollout 工作流的前提下启动同参数训练。
|
||||
|
||||
**Architecture:** 保留 RoboIMI 现有 ResNet 三相机观测编码、normalization、queue-based online rollout 和训练脚本;新增 AttnRes 组件与 IMF transformer head,并新增 IMF 专用 agent 以覆盖 DDPM loss / DDIM inference 语义。训练脚本只做最小接线修改,让新 head/agent 能用现有 optimizer、checkpoint、SwanLab 和 headless rollout。
|
||||
|
||||
**Tech Stack:** PyTorch, Hydra, diffusers schedulers (仅保留兼容初始化), MuJoCo rollout, unittest, SwanLab
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
### New files
|
||||
- `roboimi/vla/models/heads/attnres_transformer_components.py` — 本地 IMF AttnRes 基础组件
|
||||
- `roboimi/vla/models/heads/imf_transformer1d.py` — IMF transformer head,暴露 `forward(sample, r, t, cond=None)`
|
||||
- `roboimi/vla/agent_imf.py` — IMF 专用 VLA agent,复用现有观测/队列/normalization 逻辑并覆盖 loss / inference
|
||||
- `roboimi/vla/conf/head/imf_transformer1d.yaml` — IMF head 配置
|
||||
- `roboimi/vla/conf/agent/resnet_imf_attnres.yaml` — IMF agent + backbone/head 组合配置
|
||||
- `tests/test_imf_transformer1d_external_alignment.py` — external `185ed659` 对齐测试
|
||||
- `tests/test_imf_vla_agent.py` — IMF agent 的 loss / inference / queue 语义测试
|
||||
|
||||
### Modified files
|
||||
- `roboimi/demos/vla_scripts/train_vla.py` — 优化器参数分组接线;确保新 agent 能无缝训练
|
||||
- `roboimi/vla/conf/config.yaml` — 保持默认配置不变,仅支持通过 override 启用 IMF agent
|
||||
- `tests/test_train_vla_transformer_optimizer.py` — 覆盖 IMF head 的 optimizer-group 行为
|
||||
- (如需要)`roboimi/vla/models/heads/__init__.py` 或相近导出文件 — 暴露新 head
|
||||
|
||||
---
|
||||
|
||||
### Task 1: 写 IMF transformer 对齐测试
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_imf_transformer1d_external_alignment.py`
|
||||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/attnres_transformer_components.py`
|
||||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
|
||||
|
||||
- [ ] **Step 1: 写失败测试,验证 local IMF head 与 external `185ed659` 的 state-dict key、前向 shape、forward 数值、optim groups 对齐**
|
||||
|
||||
```python
|
||||
with torch.no_grad():
|
||||
external_out = external_model(sample=sample, r=r, t=t, cond=cond)
|
||||
local_out = local_model(sample=sample, r=r, t=t, cond=cond)
|
||||
assert torch.allclose(local_out, external_out, atol=1e-6, rtol=1e-5)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: 运行单测,确认当前失败**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||||
Expected: FAIL,提示 `imf_transformer1d` / `attnres` 模块不存在
|
||||
|
||||
- [ ] **Step 3: 若测试需要复用现有 external-loader 逻辑,则从 `tests/test_transformer1d_external_alignment.py` 复制最小必要 helper,避免重复依赖 session context**
|
||||
|
||||
- [ ] **Step 4: 提交测试骨架**
|
||||
|
||||
```bash
|
||||
git add tests/test_imf_transformer1d_external_alignment.py
|
||||
git commit -m "test: add IMF transformer external alignment coverage"
|
||||
```
|
||||
|
||||
### Task 2: 实现 AttnRes 组件与 IMF transformer head
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/models/heads/attnres_transformer_components.py`
|
||||
- Create: `roboimi/vla/models/heads/imf_transformer1d.py`
|
||||
- Modify: `tests/test_imf_transformer1d_external_alignment.py`
|
||||
|
||||
- [ ] **Step 1: 按 external `185ed659` 迁移 AttnRes 基础组件,保持命名和参数语义一致**
|
||||
|
||||
必须包含:
|
||||
- `RMSNorm`
|
||||
- `RMSNormNoWeight`
|
||||
- `precompute_rope_freqs`
|
||||
- `apply_rope`
|
||||
- `GroupedQuerySelfAttention`
|
||||
- `SwiGLUFFN`
|
||||
- `AttnResOperator`
|
||||
- `AttnResSubLayer`
|
||||
- `AttnResTransformerBackbone`
|
||||
|
||||
- [ ] **Step 2: 在 `imf_transformer1d.py` 中实现本地 IMF head**
|
||||
|
||||
必须满足:
|
||||
- `forward(sample, r, t, cond=None)`
|
||||
- 默认支持 `backbone_type='attnres_full'`
|
||||
- token 序列为 `[r_token, t_token, cond_tokens..., sample_tokens...]`
|
||||
- 输出只切回 sample token 段
|
||||
- 保留 `get_optim_groups()` 供 AdamW 分组
|
||||
|
||||
- [ ] **Step 3: 运行对齐测试,修正 state-dict key / init / no-decay 参数分组不一致问题**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 4: 提交模型组件实现**
|
||||
|
||||
```bash
|
||||
git add roboimi/vla/models/heads/attnres_transformer_components.py \
|
||||
roboimi/vla/models/heads/imf_transformer1d.py \
|
||||
tests/test_imf_transformer1d_external_alignment.py
|
||||
git commit -m "feat: add IMF AttnRes transformer head"
|
||||
```
|
||||
|
||||
### Task 3: 写 IMF agent 行为测试
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_imf_vla_agent.py`
|
||||
- Reference: `roboimi/vla/agent.py`
|
||||
- Reference: `tests/test_resnet_transformer_agent_wiring.py`
|
||||
|
||||
- [ ] **Step 1: 写失败测试,覆盖 IMF agent 的核心契约**
|
||||
|
||||
需要覆盖:
|
||||
1. `compute_loss()` 接受当前 batch 结构并返回标量 loss
|
||||
2. `predict_action()` 输出 `(B, pred_horizon, action_dim)`
|
||||
3. `select_action()` 仍按 queue/chunk 语义工作
|
||||
4. `predict_action()` 不走 DDIM 多步循环,而是只触发一步 IMF sample
|
||||
5. `action_is_pad` 存在时仅在有效 action 上计 loss
|
||||
|
||||
- [ ] **Step 2: 用 stub backbone / stub head 记录调用参数,验证 `r,t,cond` 的传递与 observation conditioning 维度正确**
|
||||
|
||||
```python
|
||||
self.assertEqual(recorded['cond'].shape, (B, obs_horizon, expected_cond_dim))
|
||||
self.assertTrue(torch.allclose(recorded['r'], torch.zeros(B)))
|
||||
self.assertTrue(torch.allclose(recorded['t'], torch.ones(B)))
|
||||
```
|
||||
|
||||
- [ ] **Step 3: 运行测试,确认当前失败**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_vla_agent -v`
|
||||
Expected: FAIL,提示 `roboimi.vla.agent_imf` 不存在
|
||||
|
||||
- [ ] **Step 4: 提交测试骨架**
|
||||
|
||||
```bash
|
||||
git add tests/test_imf_vla_agent.py
|
||||
git commit -m "test: add IMF VLA agent behavior coverage"
|
||||
```
|
||||
|
||||
### Task 4: 实现 IMF agent 与 Hydra 接线
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/agent_imf.py`
|
||||
- Create: `roboimi/vla/conf/head/imf_transformer1d.yaml`
|
||||
- Create: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`
|
||||
- Modify: `roboimi/demos/vla_scripts/train_vla.py`
|
||||
- Modify: `tests/test_train_vla_transformer_optimizer.py`
|
||||
- Modify: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: 以 `VLAAgent` 为基础实现 `IMFVLAAgent`**
|
||||
|
||||
实现策略:
|
||||
- 复用 `VLAAgent.__init__`、`_build_cond()`、`reset()`、`_populate_queues()`、`_prepare_observation_batch()`、`select_action()`、`get_normalization_stats()`
|
||||
- 覆盖:
|
||||
- `compute_loss()` -> IMF objective
|
||||
- `predict_action()` -> one-step sample
|
||||
- 提供内部 helper:
|
||||
- `_broadcast_batch_time`
|
||||
- `_apply_conditioning`(如需)
|
||||
- `_compute_u_and_du_dt`
|
||||
- `_compound_velocity`
|
||||
- `_sample_one_step`
|
||||
|
||||
- [ ] **Step 2: 在 JVP 路径中加入 CUDA math SDPA fallback,保持 external repo 的稳定性策略**
|
||||
|
||||
- [ ] **Step 3: 新增 Hydra 配置,让 `agent=resnet_imf_attnres` 可实例化**
|
||||
|
||||
关键默认值:
|
||||
- `_target_: roboimi.vla.agent_imf.IMFVLAAgent`
|
||||
- `head._target_: roboimi.vla.models.heads.imf_transformer1d.IMFTransformer1D`
|
||||
- `head.backbone_type: attnres_full`
|
||||
- `head.causal_attn: false`
|
||||
- `head.time_as_cond: true`
|
||||
- `head.n_cond_layers: 0`
|
||||
- `inference_steps: 1`
|
||||
- `camera_names: ${data.camera_names}`
|
||||
- `vision_backbone.camera_names: ${agent.camera_names}`
|
||||
|
||||
- [ ] **Step 4: 让训练脚本对任何带 `get_optim_groups()` 的 head 复用参数分组,而不是硬编码旧 transformer head_type**
|
||||
|
||||
推荐最小改法:
|
||||
```python
|
||||
use_head_groups = callable(getattr(noise_pred_net, 'get_optim_groups', None))
|
||||
```
|
||||
|
||||
- [ ] **Step 5: 运行测试并修复 wiring 问题**
|
||||
|
||||
Run:
|
||||
- `python -m unittest tests.test_imf_vla_agent -v`
|
||||
- `python -m unittest tests.test_train_vla_transformer_optimizer -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 6: 提交 agent / config / train-script 接线**
|
||||
|
||||
```bash
|
||||
git add roboimi/vla/agent_imf.py \
|
||||
roboimi/vla/conf/head/imf_transformer1d.yaml \
|
||||
roboimi/vla/conf/agent/resnet_imf_attnres.yaml \
|
||||
roboimi/demos/vla_scripts/train_vla.py \
|
||||
tests/test_imf_vla_agent.py \
|
||||
tests/test_train_vla_transformer_optimizer.py
|
||||
git commit -m "feat: add IMF VLA agent and training wiring"
|
||||
```
|
||||
|
||||
### Task 5: 集成验证与训练启动
|
||||
|
||||
**Files:**
|
||||
- Modify: none required unless验证暴露真实问题
|
||||
- Use run artifacts under: `runs/`
|
||||
|
||||
- [ ] **Step 1: 运行聚焦测试集**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
python -m unittest \
|
||||
tests.test_imf_transformer1d_external_alignment \
|
||||
tests.test_imf_vla_agent \
|
||||
tests.test_resnet_transformer_agent_wiring \
|
||||
tests.test_train_vla_transformer_optimizer -v
|
||||
```
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 2: 运行一个最小 GPU 训练冒烟任务(不必长跑)**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||||
agent=resnet_imf_attnres \
|
||||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||||
data.camera_names=[r_vis,top,front] \
|
||||
train.device=cuda train.max_steps=2 train.batch_size=4 train.num_workers=2 \
|
||||
train.use_swanlab=false train.rollout_val_freq_epochs=0
|
||||
```
|
||||
Expected: 成功完成 2 steps,生成 checkpoint / log,无 shape 或 JVP 错误
|
||||
|
||||
- [ ] **Step 3: 用正式参数启动 IMF 训练**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||||
agent=resnet_imf_attnres \
|
||||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||||
data.camera_names=[r_vis,top,front] \
|
||||
train.device=cuda train.val_split=0.0 train.seed=42 \
|
||||
train.batch_size=80 train.lr=5e-4 train.num_workers=12 train.max_steps=150000 \
|
||||
train.log_freq=100 train.save_freq=10000 train.use_swanlab=true \
|
||||
train.swanlab_project=roboimi-vla \
|
||||
train.rollout_val_freq_epochs=5 train.rollout_validate_on_checkpoint=false \
|
||||
train.rollout_num_episodes=5 train.warmup_steps=2000 \
|
||||
train.scheduler_type=cosine train.min_lr=1e-6 train.weight_decay=1e-5 train.grad_clip=1.0 \
|
||||
agent.pred_horizon=16 agent.inference_steps=1 \
|
||||
agent.head.n_emb=384 agent.head.n_layer=18 agent.head.n_head=1 agent.head.n_kv_head=1 \
|
||||
agent.vision_backbone.pretrained_backbone_weights=null \
|
||||
agent.vision_backbone.freeze_backbone=false \
|
||||
agent.vision_backbone.use_separate_rgb_encoder_per_camera=true
|
||||
```
|
||||
Expected: 训练启动成功,SwanLab 记录完整 config,5 epoch 一次 headless rollout
|
||||
|
||||
- [ ] **Step 4: 记录 run 路径、训练 PID、SwanLab 运行名并向用户汇报**
|
||||
|
||||
- [ ] **Step 5: 提交最终收尾改动(如果 smoke fix 需要额外 patch)**
|
||||
|
||||
```bash
|
||||
git add <changed files>
|
||||
git commit -m "chore: verify IMF AttnRes training launch"
|
||||
```
|
||||
272
docs/superpowers/specs/2026-04-01-imf-attnres-policy-design.md
Normal file
272
docs/superpowers/specs/2026-04-01-imf-attnres-policy-design.md
Normal file
@@ -0,0 +1,272 @@
|
||||
# IMF-AttnRes Policy Migration Design
|
||||
|
||||
**Date:** 2026-04-01
|
||||
**Status:** Approved in chat, written spec pending review
|
||||
|
||||
## Goal
|
||||
|
||||
将 `/home/droid/project/diffusion_policy` 中提交 `185ed659` 的 IMF-AttnRes diffusion policy 迁移到当前 `roboimi` 仓库,作为当前 DiT / Transformer diffusion policy 的替代训练选项;同时迁移其训练目标与一步推理机制,并保持 RoboIMI 现有的仿真环境、三相机视觉输入、数据集格式、训练脚本和 rollout 验证工作流可继续使用。
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- 不迁移 external repo 中与当前任务无关的 obs encoder、dataset、env wrapper、PushT 专用逻辑。
|
||||
- 不强行复刻 external repo 中全部目录结构;仅迁移当前 RoboIMI 训练所必需的模型、loss、inference 语义。
|
||||
- 不在本次工作中同时保留旧 DiT 为默认训练目标;旧配置继续可用,但新模型单独提供 config 入口。
|
||||
|
||||
## User-Confirmed Requirements
|
||||
|
||||
1. 迁移对象是 `185ed659` 中的 **IMF-AttnRes 模型相关代码**。
|
||||
2. 不只是迁移骨架,还要迁移:
|
||||
- **训练目标**
|
||||
- **一步推理机制**
|
||||
3. 视觉输入与当前 RoboIMI diffusion policy 一致:
|
||||
- 使用三个相机图像作为条件输入
|
||||
- 图像观测必须作为条件,而不是拼进输出预测目标
|
||||
4. 当前任务里,IMF policy 用来替代现有 DiT/Transformer diffusion policy 训练。
|
||||
5. 训练参数沿用最近一次训练的大体设置(后续由训练命令显式覆盖),但推理方式改为 IMF 的 one-step 机制。
|
||||
6. 用户接受 IMF 中“全注意力 / 非因果注意力”的实现约束。
|
||||
|
||||
## External Source of Truth
|
||||
|
||||
迁移语义以 external repo 的以下文件为准:
|
||||
|
||||
- `diffusion_policy/model/diffusion/attnres_transformer_components.py`
|
||||
- `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
|
||||
- `diffusion_policy/policy/imf_transformer_hybrid_image_policy.py`
|
||||
- 参考配置:`image_pusht_diffusion_policy_dit_imf_attnres_full.yaml`
|
||||
|
||||
其中最关键的差异是:该策略并非 DDPM/DDIM 多步去噪,而是 IMF 训练目标 + one-step 推理。
|
||||
|
||||
## Current RoboIMI Baseline
|
||||
|
||||
当前 RoboIMI 中与该任务直接相关的基线如下:
|
||||
|
||||
- 视觉编码:`ResNetDiffusionBackbone`
|
||||
- 三相机:`r_vis`, `top`, `front`
|
||||
- 每个时间步将相机特征与 `qpos` 拼接为 per-step condition
|
||||
- 策略主体:`VLAAgent`
|
||||
- `compute_loss()` 使用 DDPM 噪声预测损失
|
||||
- `predict_action()` 使用 DDIM 多步采样
|
||||
- 在线控制通过动作队列机制在 `select_action()` 中按 chunk 触发预测
|
||||
- 训练脚本:`roboimi/demos/vla_scripts/train_vla.py`
|
||||
- 支持 GPU 训练、SwanLab 日志、headless rollout 验证
|
||||
|
||||
因此,本次迁移的核心不是换视觉 backbone,而是替换 **head + loss + inference semantics**。
|
||||
|
||||
## Recommended Integration Approach
|
||||
|
||||
采用 **最小侵入式集成**:
|
||||
|
||||
1. **保留当前 RoboIMI 的视觉编码、数据读取、rollout/eval、训练脚本主框架**。
|
||||
2. **新增 IMF 专用 head 模块**,在 RoboIMI 内本地实现:
|
||||
- AttnRes 组件
|
||||
- IMF transformer 主体
|
||||
3. **新增 IMF 专用 agent**,复用当前 `VLAAgent` 的:
|
||||
- 归一化逻辑
|
||||
- 相机顺序管理
|
||||
- 观测缓存 / 动作 chunk 缓存
|
||||
- rollout 接口
|
||||
但覆盖:
|
||||
- `compute_loss()`
|
||||
- `predict_action()`
|
||||
4. **新增独立 Hydra config**,让 IMF policy 作为新的 agent 选项,不破坏已有 resnet_transformer / gr00t_dit 配置。
|
||||
|
||||
这样做的原因:
|
||||
|
||||
- 迁移 IMF 语义时不必把当前 DDPM agent 搅乱;
|
||||
- rollout / eval / checkpoint 逻辑仍然可复用;
|
||||
- 便于和现有 Transformer / DiT 直接做 A/B 对比训练。
|
||||
|
||||
## Architecture
|
||||
|
||||
### 1. Observation / Conditioning Path
|
||||
|
||||
沿用当前 RoboIMI 的视觉路径:
|
||||
|
||||
- 输入观测:`images={r_vis, top, front}` + `qpos`
|
||||
- `ResNetDiffusionBackbone` 对每个相机编码,得到 per-camera feature
|
||||
- `state_encoder` 编码 `qpos`
|
||||
- 将三相机特征与 state feature 按时间步拼接,形成 `per_step_cond`
|
||||
|
||||
这里不迁移 external repo 的 obs_encoder 实现;我们只对齐 **“图像作为条件 token 输入 transformer”** 这一语义。
|
||||
|
||||
### 2. Condition Tokenization
|
||||
|
||||
对齐 external IMF transformer 的 token 使用方式:
|
||||
|
||||
- action trajectory token:由 `(B, pred_horizon, action_dim)` 通过线性层映射到 `n_emb`
|
||||
- time token:两个标量 `r` 与 `t`,分别通过 sinusoidal embedding + linear projection 得到 token
|
||||
- observation token:`per_step_cond` 通过线性层映射到 `n_emb`
|
||||
- 最终 token 序列为:
|
||||
- `[r_token, t_token, obs_cond_tokens..., action_tokens...]`
|
||||
|
||||
在当前任务中,obs token 数量等于 `obs_horizon`,且图像观测始终作为条件输入。
|
||||
|
||||
### 3. IMF-AttnRes Backbone
|
||||
|
||||
在 RoboIMI 内新增 AttnRes backbone 实现,保持 external commit 的关键语义:
|
||||
|
||||
- `RMSNorm` / `RMSNormNoWeight`
|
||||
- RoPE
|
||||
- Grouped Query Self-Attention
|
||||
- SwiGLU FFN
|
||||
- AttnRes operator / residual source aggregation
|
||||
- `AttnResTransformerBackbone`
|
||||
|
||||
并保持:
|
||||
|
||||
- **full attention**(不使用因果注意力)
|
||||
- `backbone_type='attnres_full'`
|
||||
- 输出仅切回 action token 部分,再经过最终 norm + head 得到 velocity-like 输出
|
||||
|
||||
### 4. Training Objective
|
||||
|
||||
训练目标从当前 DDPM epsilon prediction 改为 external IMF 目标:
|
||||
|
||||
给定真实轨迹 `x` 与随机噪声 `e`:
|
||||
|
||||
1. 采样 `t ~ U(0,1)`、`r ~ U(0,1)`,并排序为 `t >= r`
|
||||
2. 构造插值状态:
|
||||
- `z_t = (1 - t) x + t e`
|
||||
3. 用模型计算:
|
||||
- `v = f(z_t, t, t, cond)`
|
||||
4. 对 `g(z, r, t) = f(z, r, t, cond)` 做 JVP,得到:
|
||||
- `u, du_dt`
|
||||
5. 构造 compound velocity:
|
||||
- `V = u + (t - r) * du_dt`
|
||||
6. 目标为:
|
||||
- `target = e - x`
|
||||
7. 用 action 维度上的 MSE 作为最终损失
|
||||
|
||||
RoboIMI 现有 batch 中的 `action_is_pad` 仍要保留支持;如果存在 padding,只在有效 action 上计算损失。
|
||||
|
||||
### 5. One-Step Inference
|
||||
|
||||
推理改为 external IMF 的一步采样语义:
|
||||
|
||||
1. 从标准高斯初始化 action trajectory `z_t`
|
||||
2. 计算 `u = f(z_t, r=0, t=1, cond)`
|
||||
3. 一步更新:
|
||||
- `x_hat = z_t - (t-r) * u = z_t - u`
|
||||
4. 反归一化得到动作序列
|
||||
|
||||
这意味着:
|
||||
|
||||
- `num_inference_steps` 对 IMF policy 固定为 `1`
|
||||
- 不再调用 DDIM scheduler 的多步 `step()`
|
||||
- 在线控制中仍沿用当前 chunk 机制:
|
||||
- 动作队列为空时触发一次 `predict_action_chunk()`
|
||||
- 取预测序列中 `[obs_horizon-1 : obs_horizon-1+num_action_steps]` 这一段入队
|
||||
|
||||
也就是说,**触发模型前向的规则不变,改变的是每次触发后的动作序列生成方式**。
|
||||
|
||||
## API / Code Structure
|
||||
|
||||
计划中的主要代码边界如下:
|
||||
|
||||
- `roboimi/vla/models/heads/attnres_transformer_components.py`
|
||||
- IMF AttnRes 基础组件
|
||||
- `roboimi/vla/models/heads/imf_transformer1d.py`
|
||||
- RoboIMI 版本 IMF transformer head
|
||||
- 对外暴露 `forward(sample, r, t, cond=None)`
|
||||
- 暴露 `get_optim_groups()` 供 AdamW 分组使用
|
||||
- `roboimi/vla/agent_imf.py`
|
||||
- 复用 `VLAAgent` 的观测处理 / normalization / queue 基础设施
|
||||
- 覆盖 IMF 的训练损失与 one-step 预测逻辑
|
||||
- Hydra config
|
||||
- `roboimi/vla/conf/head/imf_transformer1d.yaml`
|
||||
- `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`
|
||||
|
||||
训练脚本主流程尽量不改;只要求它能 instantiate 新 agent 并继续使用当前 rollout / checkpoint / swanlab 逻辑。
|
||||
|
||||
## Compatibility Decisions
|
||||
|
||||
## Initial Config Defaults To Preserve
|
||||
|
||||
为避免迁移时语义漂移,首版 IMF 配置默认值明确固定为:
|
||||
|
||||
- `backbone_type: attnres_full`
|
||||
- `n_head: 1`
|
||||
- `n_kv_head: 1`
|
||||
- `n_cond_layers: 0`
|
||||
- `time_as_cond: true`
|
||||
- `causal_attn: false`
|
||||
- `num_inference_steps: 1`
|
||||
|
||||
这些默认值与 external `185ed659` 的 IMF-AttnRes 使用方式保持一致;后续调参可以覆盖,但首版迁移必须先以该语义跑通。
|
||||
|
||||
### Reuse From RoboIMI
|
||||
|
||||
保留:
|
||||
|
||||
- 三相机数据读取方式
|
||||
- ResNet visual backbone
|
||||
- qpos / action normalization
|
||||
- 训练循环、优化器、scheduler、SwanLab、headless rollout
|
||||
- `select_action()` 的在线 chunk 执行方式
|
||||
|
||||
### Replace With External IMF Semantics
|
||||
|
||||
替换:
|
||||
|
||||
- transformer head 实现
|
||||
- diffusion training objective
|
||||
- inference sampling semantics
|
||||
|
||||
### Intentionally Not Mirrored 1:1
|
||||
|
||||
不强行与 external repo 一致的部分:
|
||||
|
||||
- external repo 的整体 policy 基类继承体系
|
||||
- external repo 的 obs encoder 模块树
|
||||
- external repo 的 normalizer / mask generator 框架
|
||||
|
||||
原因是当前 RoboIMI 已有稳定的数据接口和 rollout 流程,直接嫁接进去更稳。
|
||||
|
||||
## Testing / Verification Strategy
|
||||
|
||||
迁移完成后至少验证以下内容:
|
||||
|
||||
1. **单元 / 冒烟验证**
|
||||
- IMF head 前向 shape 正确
|
||||
- IMF agent `compute_loss()` 在真实 batch 上可前向、反向
|
||||
- IMF agent `predict_action()` 能输出 `(B, pred_horizon, action_dim)`
|
||||
2. **训练链路验证**
|
||||
- 使用 GPU 跑一个短训练任务,确认:
|
||||
- dataloader 正常
|
||||
- optimizer / lr scheduler 正常
|
||||
- SwanLab 正常记录配置和训练指标
|
||||
3. **rollout 验证**
|
||||
- 训练中周期性 headless rollout 能跑通
|
||||
- 环境仍按 EE-style `step()` 接收动作
|
||||
4. **最终交付**
|
||||
- 用用户指定的同类超参数启动正式训练
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
### Risk 1: JVP 在 CUDA 注意力内核上不稳定
|
||||
|
||||
缓解:沿用 external repo 的策略,在 JVP 路径上切换到 math SDP kernel,必要时 fallback 到 `torch.autograd.functional.jvp`。同时,JVP 的切线构造与 `u, du_dt` 计算流程必须严格对齐 external source,不在本次迁移中自行改写其数学语义。
|
||||
|
||||
### Risk 2: Optimizer 参数分组遗漏新模块
|
||||
|
||||
缓解:IMF head 提供 `get_optim_groups()`,并在训练脚本中按“只要 head 提供该接口就使用”的策略统一处理,而不是绑定旧 `head_type`。
|
||||
|
||||
### Risk 3: 现有 rollout 逻辑假定 DDIM 多步采样
|
||||
|
||||
缓解:保持 `select_action()` / `predict_action_chunk()` 接口不变,只替换 `predict_action()` 内部实现,确保 eval 代码无需理解 IMF 细节。
|
||||
|
||||
### Risk 4: 训练命令参数与新 config 不一致
|
||||
|
||||
缓解:新增独立 agent config,并保留此前训练参数作为显式 CLI override 模板。
|
||||
|
||||
## Success Criteria
|
||||
|
||||
以下条件全部满足,视为本次迁移成功:
|
||||
|
||||
1. RoboIMI 中新增 IMF-AttnRes policy,可通过 Hydra config 单独启用。
|
||||
2. 训练时使用 external IMF 的 loss,而不是当前 DDPM epsilon loss。
|
||||
3. 推理时使用 one-step IMF 采样,而不是 DDIM 多步采样。
|
||||
4. 三相机图像始终作为条件输入参与模型前向。
|
||||
5. 在线 rollout 能在 headless 仿真环境中跑通。
|
||||
6. 能按最近一次实验参数模板成功启动训练。
|
||||
Reference in New Issue
Block a user