merge: imf attnres policy
# Conflicts: # roboimi/demos/vla_scripts/eval_vla.py # roboimi/envs/double_base.py
This commit is contained in:
@@ -0,0 +1,268 @@
|
||||
# IMF-AttnRes Policy Migration Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** 将 external `diffusion_policy@185ed659` 的 IMF-AttnRes 模型、训练目标和一步推理机制迁移到 RoboIMI,并在保持三相机视觉条件输入与现有训练/rollout 工作流的前提下启动同参数训练。
|
||||
|
||||
**Architecture:** 保留 RoboIMI 现有 ResNet 三相机观测编码、normalization、queue-based online rollout 和训练脚本;新增 AttnRes 组件与 IMF transformer head,并新增 IMF 专用 agent 以覆盖 DDPM loss / DDIM inference 语义。训练脚本只做最小接线修改,让新 head/agent 能用现有 optimizer、checkpoint、SwanLab 和 headless rollout。
|
||||
|
||||
**Tech Stack:** PyTorch, Hydra, diffusers schedulers (仅保留兼容初始化), MuJoCo rollout, unittest, SwanLab
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
### New files
|
||||
- `roboimi/vla/models/heads/attnres_transformer_components.py` — 本地 IMF AttnRes 基础组件
|
||||
- `roboimi/vla/models/heads/imf_transformer1d.py` — IMF transformer head,暴露 `forward(sample, r, t, cond=None)`
|
||||
- `roboimi/vla/agent_imf.py` — IMF 专用 VLA agent,复用现有观测/队列/normalization 逻辑并覆盖 loss / inference
|
||||
- `roboimi/vla/conf/head/imf_transformer1d.yaml` — IMF head 配置
|
||||
- `roboimi/vla/conf/agent/resnet_imf_attnres.yaml` — IMF agent + backbone/head 组合配置
|
||||
- `tests/test_imf_transformer1d_external_alignment.py` — external `185ed659` 对齐测试
|
||||
- `tests/test_imf_vla_agent.py` — IMF agent 的 loss / inference / queue 语义测试
|
||||
|
||||
### Modified files
|
||||
- `roboimi/demos/vla_scripts/train_vla.py` — 优化器参数分组接线;确保新 agent 能无缝训练
|
||||
- `roboimi/vla/conf/config.yaml` — 保持默认配置不变,仅支持通过 override 启用 IMF agent
|
||||
- `tests/test_train_vla_transformer_optimizer.py` — 覆盖 IMF head 的 optimizer-group 行为
|
||||
- (如需要)`roboimi/vla/models/heads/__init__.py` 或相近导出文件 — 暴露新 head
|
||||
|
||||
---
|
||||
|
||||
### Task 1: 写 IMF transformer 对齐测试
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_imf_transformer1d_external_alignment.py`
|
||||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/attnres_transformer_components.py`
|
||||
- Reference: `/home/droid/project/diffusion_policy/diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
|
||||
|
||||
- [ ] **Step 1: 写失败测试,验证 local IMF head 与 external `185ed659` 的 state-dict key、前向 shape、forward 数值、optim groups 对齐**
|
||||
|
||||
```python
|
||||
with torch.no_grad():
|
||||
external_out = external_model(sample=sample, r=r, t=t, cond=cond)
|
||||
local_out = local_model(sample=sample, r=r, t=t, cond=cond)
|
||||
assert torch.allclose(local_out, external_out, atol=1e-6, rtol=1e-5)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: 运行单测,确认当前失败**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||||
Expected: FAIL,提示 `imf_transformer1d` / `attnres` 模块不存在
|
||||
|
||||
- [ ] **Step 3: 若测试需要复用现有 external-loader 逻辑,则从 `tests/test_transformer1d_external_alignment.py` 复制最小必要 helper,避免重复依赖 session context**
|
||||
|
||||
- [ ] **Step 4: 提交测试骨架**
|
||||
|
||||
```bash
|
||||
git add tests/test_imf_transformer1d_external_alignment.py
|
||||
git commit -m "test: add IMF transformer external alignment coverage"
|
||||
```
|
||||
|
||||
### Task 2: 实现 AttnRes 组件与 IMF transformer head
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/models/heads/attnres_transformer_components.py`
|
||||
- Create: `roboimi/vla/models/heads/imf_transformer1d.py`
|
||||
- Modify: `tests/test_imf_transformer1d_external_alignment.py`
|
||||
|
||||
- [ ] **Step 1: 按 external `185ed659` 迁移 AttnRes 基础组件,保持命名和参数语义一致**
|
||||
|
||||
必须包含:
|
||||
- `RMSNorm`
|
||||
- `RMSNormNoWeight`
|
||||
- `precompute_rope_freqs`
|
||||
- `apply_rope`
|
||||
- `GroupedQuerySelfAttention`
|
||||
- `SwiGLUFFN`
|
||||
- `AttnResOperator`
|
||||
- `AttnResSubLayer`
|
||||
- `AttnResTransformerBackbone`
|
||||
|
||||
- [ ] **Step 2: 在 `imf_transformer1d.py` 中实现本地 IMF head**
|
||||
|
||||
必须满足:
|
||||
- `forward(sample, r, t, cond=None)`
|
||||
- 默认支持 `backbone_type='attnres_full'`
|
||||
- token 序列为 `[r_token, t_token, cond_tokens..., sample_tokens...]`
|
||||
- 输出只切回 sample token 段
|
||||
- 保留 `get_optim_groups()` 供 AdamW 分组
|
||||
|
||||
- [ ] **Step 3: 运行对齐测试,修正 state-dict key / init / no-decay 参数分组不一致问题**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_transformer1d_external_alignment -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 4: 提交模型组件实现**
|
||||
|
||||
```bash
|
||||
git add roboimi/vla/models/heads/attnres_transformer_components.py \
|
||||
roboimi/vla/models/heads/imf_transformer1d.py \
|
||||
tests/test_imf_transformer1d_external_alignment.py
|
||||
git commit -m "feat: add IMF AttnRes transformer head"
|
||||
```
|
||||
|
||||
### Task 3: 写 IMF agent 行为测试
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_imf_vla_agent.py`
|
||||
- Reference: `roboimi/vla/agent.py`
|
||||
- Reference: `tests/test_resnet_transformer_agent_wiring.py`
|
||||
|
||||
- [ ] **Step 1: 写失败测试,覆盖 IMF agent 的核心契约**
|
||||
|
||||
需要覆盖:
|
||||
1. `compute_loss()` 接受当前 batch 结构并返回标量 loss
|
||||
2. `predict_action()` 输出 `(B, pred_horizon, action_dim)`
|
||||
3. `select_action()` 仍按 queue/chunk 语义工作
|
||||
4. `predict_action()` 不走 DDIM 多步循环,而是只触发一步 IMF sample
|
||||
5. `action_is_pad` 存在时仅在有效 action 上计 loss
|
||||
|
||||
- [ ] **Step 2: 用 stub backbone / stub head 记录调用参数,验证 `r,t,cond` 的传递与 observation conditioning 维度正确**
|
||||
|
||||
```python
|
||||
self.assertEqual(recorded['cond'].shape, (B, obs_horizon, expected_cond_dim))
|
||||
self.assertTrue(torch.allclose(recorded['r'], torch.zeros(B)))
|
||||
self.assertTrue(torch.allclose(recorded['t'], torch.ones(B)))
|
||||
```
|
||||
|
||||
- [ ] **Step 3: 运行测试,确认当前失败**
|
||||
|
||||
Run: `python -m unittest tests.test_imf_vla_agent -v`
|
||||
Expected: FAIL,提示 `roboimi.vla.agent_imf` 不存在
|
||||
|
||||
- [ ] **Step 4: 提交测试骨架**
|
||||
|
||||
```bash
|
||||
git add tests/test_imf_vla_agent.py
|
||||
git commit -m "test: add IMF VLA agent behavior coverage"
|
||||
```
|
||||
|
||||
### Task 4: 实现 IMF agent 与 Hydra 接线
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/agent_imf.py`
|
||||
- Create: `roboimi/vla/conf/head/imf_transformer1d.yaml`
|
||||
- Create: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`
|
||||
- Modify: `roboimi/demos/vla_scripts/train_vla.py`
|
||||
- Modify: `tests/test_train_vla_transformer_optimizer.py`
|
||||
- Modify: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: 以 `VLAAgent` 为基础实现 `IMFVLAAgent`**
|
||||
|
||||
实现策略:
|
||||
- 复用 `VLAAgent.__init__`、`_build_cond()`、`reset()`、`_populate_queues()`、`_prepare_observation_batch()`、`select_action()`、`get_normalization_stats()`
|
||||
- 覆盖:
|
||||
- `compute_loss()` -> IMF objective
|
||||
- `predict_action()` -> one-step sample
|
||||
- 提供内部 helper:
|
||||
- `_broadcast_batch_time`
|
||||
- `_apply_conditioning`(如需)
|
||||
- `_compute_u_and_du_dt`
|
||||
- `_compound_velocity`
|
||||
- `_sample_one_step`
|
||||
|
||||
- [ ] **Step 2: 在 JVP 路径中加入 CUDA math SDPA fallback,保持 external repo 的稳定性策略**
|
||||
|
||||
- [ ] **Step 3: 新增 Hydra 配置,让 `agent=resnet_imf_attnres` 可实例化**
|
||||
|
||||
关键默认值:
|
||||
- `_target_: roboimi.vla.agent_imf.IMFVLAAgent`
|
||||
- `head._target_: roboimi.vla.models.heads.imf_transformer1d.IMFTransformer1D`
|
||||
- `head.backbone_type: attnres_full`
|
||||
- `head.causal_attn: false`
|
||||
- `head.time_as_cond: true`
|
||||
- `head.n_cond_layers: 0`
|
||||
- `inference_steps: 1`
|
||||
- `camera_names: ${data.camera_names}`
|
||||
- `vision_backbone.camera_names: ${agent.camera_names}`
|
||||
|
||||
- [ ] **Step 4: 让训练脚本对任何带 `get_optim_groups()` 的 head 复用参数分组,而不是硬编码旧 transformer head_type**
|
||||
|
||||
推荐最小改法:
|
||||
```python
|
||||
use_head_groups = callable(getattr(noise_pred_net, 'get_optim_groups', None))
|
||||
```
|
||||
|
||||
- [ ] **Step 5: 运行测试并修复 wiring 问题**
|
||||
|
||||
Run:
|
||||
- `python -m unittest tests.test_imf_vla_agent -v`
|
||||
- `python -m unittest tests.test_train_vla_transformer_optimizer -v`
|
||||
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 6: 提交 agent / config / train-script 接线**
|
||||
|
||||
```bash
|
||||
git add roboimi/vla/agent_imf.py \
|
||||
roboimi/vla/conf/head/imf_transformer1d.yaml \
|
||||
roboimi/vla/conf/agent/resnet_imf_attnres.yaml \
|
||||
roboimi/demos/vla_scripts/train_vla.py \
|
||||
tests/test_imf_vla_agent.py \
|
||||
tests/test_train_vla_transformer_optimizer.py
|
||||
git commit -m "feat: add IMF VLA agent and training wiring"
|
||||
```
|
||||
|
||||
### Task 5: 集成验证与训练启动
|
||||
|
||||
**Files:**
|
||||
- Modify: none required unless验证暴露真实问题
|
||||
- Use run artifacts under: `runs/`
|
||||
|
||||
- [ ] **Step 1: 运行聚焦测试集**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
python -m unittest \
|
||||
tests.test_imf_transformer1d_external_alignment \
|
||||
tests.test_imf_vla_agent \
|
||||
tests.test_resnet_transformer_agent_wiring \
|
||||
tests.test_train_vla_transformer_optimizer -v
|
||||
```
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 2: 运行一个最小 GPU 训练冒烟任务(不必长跑)**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||||
agent=resnet_imf_attnres \
|
||||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||||
data.camera_names=[r_vis,top,front] \
|
||||
train.device=cuda train.max_steps=2 train.batch_size=4 train.num_workers=2 \
|
||||
train.use_swanlab=false train.rollout_val_freq_epochs=0
|
||||
```
|
||||
Expected: 成功完成 2 steps,生成 checkpoint / log,无 shape 或 JVP 错误
|
||||
|
||||
- [ ] **Step 3: 用正式参数启动 IMF 训练**
|
||||
|
||||
Run:
|
||||
```bash
|
||||
/home/droid/.conda/envs/roboimi/bin/python roboimi/demos/vla_scripts/train_vla.py \
|
||||
agent=resnet_imf_attnres \
|
||||
data.dataset_dir=/home/droid/project/diana_sim/sim_transfer \
|
||||
data.camera_names=[r_vis,top,front] \
|
||||
train.device=cuda train.val_split=0.0 train.seed=42 \
|
||||
train.batch_size=80 train.lr=5e-4 train.num_workers=12 train.max_steps=150000 \
|
||||
train.log_freq=100 train.save_freq=10000 train.use_swanlab=true \
|
||||
train.swanlab_project=roboimi-vla \
|
||||
train.rollout_val_freq_epochs=5 train.rollout_validate_on_checkpoint=false \
|
||||
train.rollout_num_episodes=5 train.warmup_steps=2000 \
|
||||
train.scheduler_type=cosine train.min_lr=1e-6 train.weight_decay=1e-5 train.grad_clip=1.0 \
|
||||
agent.pred_horizon=16 agent.inference_steps=1 \
|
||||
agent.head.n_emb=384 agent.head.n_layer=18 agent.head.n_head=1 agent.head.n_kv_head=1 \
|
||||
agent.vision_backbone.pretrained_backbone_weights=null \
|
||||
agent.vision_backbone.freeze_backbone=false \
|
||||
agent.vision_backbone.use_separate_rgb_encoder_per_camera=true
|
||||
```
|
||||
Expected: 训练启动成功,SwanLab 记录完整 config,5 epoch 一次 headless rollout
|
||||
|
||||
- [ ] **Step 4: 记录 run 路径、训练 PID、SwanLab 运行名并向用户汇报**
|
||||
|
||||
- [ ] **Step 5: 提交最终收尾改动(如果 smoke fix 需要额外 patch)**
|
||||
|
||||
```bash
|
||||
git add <changed files>
|
||||
git commit -m "chore: verify IMF AttnRes training launch"
|
||||
```
|
||||
@@ -0,0 +1,79 @@
|
||||
# IMF Rollout Trajectory Images and Short-Horizon Training Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add training-time rollout front trajectory image export plus SwanLab image logging, then start a new local IMF training run with `emb=384`, `layer=12`, `pred_horizon=8`, `num_action_steps=4`, `max_steps=50000`.
|
||||
|
||||
**Architecture:** Extend `eval_vla.py` so a rollout can emit one per-episode static front-view image with red EE trajectory overlay. Extend `train_vla.py` so rollout validation forces image export, forces video off, and uploads those per-episode images to SwanLab. Launch the requested new run through explicit command-line overrides rather than branch-default config changes.
|
||||
|
||||
**Tech Stack:** Python, PyTorch, Hydra/OmegaConf, MuJoCo, OpenCV, SwanLab.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add and validate rollout image tests
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/test_eval_vla_rollout_artifacts.py`
|
||||
- Modify: `tests/test_train_vla_swanlab_logging.py`
|
||||
- Modify: `tests/test_train_vla_rollout_validation.py`
|
||||
|
||||
- [ ] Add/adjust eval tests so they assert per-episode trajectory image paths are produced without requiring video export.
|
||||
- [ ] Add/adjust training tests so they assert training-time rollout validation forces `record_video=false`.
|
||||
- [ ] Add/adjust training tests so they assert trajectory image paths flow from eval summary into SwanLab media logging.
|
||||
- [ ] Add/adjust training tests so they assert image media is logged, not only scalar reward metrics.
|
||||
|
||||
### Task 2: Implement per-episode front trajectory image export in eval
|
||||
|
||||
**Files:**
|
||||
- Modify: `roboimi/demos/vla_scripts/eval_vla.py`
|
||||
- Reuse/Read: `roboimi/utils/raw_action_trajectory_viewer.py`
|
||||
- Modify: `roboimi/vla/conf/eval/eval.yaml`
|
||||
|
||||
- [ ] Add config plumbing for `save_trajectory_image` and `trajectory_image_camera_name`.
|
||||
- [ ] Ensure the default training-time camera resolution path is pinned to `front`.
|
||||
- [ ] Implement distinct per-episode image naming so 5 rollout episodes create 5 distinct PNGs.
|
||||
- [ ] Reuse the existing red trajectory representation logic when composing the PNG.
|
||||
- [ ] Ensure headless eval works under EGL even on machines with `DISPLAY` set.
|
||||
|
||||
### Task 3: Implement SwanLab rollout image logging in training
|
||||
|
||||
**Files:**
|
||||
- Modify: `roboimi/demos/vla_scripts/train_vla.py`
|
||||
- Modify: `tests/test_train_vla_swanlab_logging.py`
|
||||
- Modify: `tests/test_train_vla_rollout_validation.py`
|
||||
|
||||
- [ ] Make `run_rollout_validation()` force `record_video=false`.
|
||||
- [ ] Make `run_rollout_validation()` force `save_trajectory_image=true` and `trajectory_image_camera_name=front`.
|
||||
- [ ] Ensure rollout validation still uses 5 episodes per validation event for the requested run.
|
||||
- [ ] Add a best-effort helper that converts per-episode image paths into SwanLab image media payloads.
|
||||
- [ ] Keep image-upload failures non-fatal and warning-only.
|
||||
|
||||
### Task 4: Verify action-chunk semantics for the new run
|
||||
|
||||
**Files:**
|
||||
- Verify: `roboimi/vla/agent.py`
|
||||
- Verify: `roboimi/vla/agent_imf.py`
|
||||
- Test: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] Confirm the existing queue logic still means “predict 8, execute first 4”.
|
||||
- [ ] Do not change branch defaults unless strictly necessary; prefer launch-time overrides.
|
||||
|
||||
### Task 5: Verify and launch the requested local training run
|
||||
|
||||
**Files:**
|
||||
- Use: `roboimi/demos/vla_scripts/train_vla.py`
|
||||
- Use: `roboimi/demos/vla_scripts/eval_vla.py`
|
||||
|
||||
- [ ] Run the targeted verification suite.
|
||||
- [ ] Run one real headless smoke eval and confirm a front trajectory PNG is produced while `video_mp4` stays null.
|
||||
- [ ] Launch the new local training run with explicit overrides including:
|
||||
- `agent=resnet_imf_attnres`
|
||||
- `agent.head.n_emb=384`
|
||||
- `agent.head.n_layer=12`
|
||||
- `agent.pred_horizon=8`
|
||||
- `agent.num_action_steps=4`
|
||||
- `train.max_steps=50000`
|
||||
- `train.rollout_num_episodes=5`
|
||||
- `train.use_swanlab=true`
|
||||
- current local baseline dataset/camera/CUDA/batch/lr/num_workers/backbone settings
|
||||
- [ ] Verify PID, GPU allocation, log tail, and SwanLab run URL.
|
||||
@@ -0,0 +1,68 @@
|
||||
# IMF Horizon Grid and AttnRes Ablation Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Run a 6-run Phase-1 IMF horizon/action-step experiment grid across available GPUs, monitor progress and collect best rollout metrics, then use the best horizon setting for a Phase-2 visual-attnres ablation.
|
||||
|
||||
**Architecture:** Use the current IMF training code as-is for Phase-1 by sweeping explicit `(pred_horizon, num_action_steps)` overrides while keeping emb=384, layer=12, and max_steps=50k fixed. Maintain a local experiment suite directory with a manifest and machine-readable status snapshots so progress can be resumed and summarized across turns. After Phase-1 completes, compare the current head-only attnres setup against a variant that also adds attnres into the visual ResNet path.
|
||||
|
||||
**Tech Stack:** Python, Hydra/OmegaConf, PyTorch, SSH/Tailscale, JSON/CSV status files, SwanLab.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Prepare the experiment suite manifest and state tracking
|
||||
|
||||
**Files:**
|
||||
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/manifest.json`
|
||||
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/status.json`
|
||||
- Create: `experiment_suites/2026-04-04-imf-horizon-grid/notes.md`
|
||||
|
||||
- [ ] Define the 6 legal Phase-1 combinations: `(8,8)`, `(16,8)`, `(16,16)`, `(32,8)`, `(32,16)`, `(32,32)`.
|
||||
- [ ] Record for each run: name, host, GPU slot, command, log path, SwanLab run name, and completion criteria.
|
||||
- [ ] Define the comparison metric as the maximum rollout average reward seen during training (`max avg_reward`), preferably read from the best-checkpoint metadata and cross-checked against logs.
|
||||
- [ ] Keep `status.json` updated with per-run state: queued / running / finished / failed plus latest parsed progress.
|
||||
|
||||
### Task 2: Prepare the remote 8-GPU execution target
|
||||
|
||||
**Files:**
|
||||
- Remote working directory under `/home/droid/`
|
||||
- Reuse or create a synced code directory for this suite
|
||||
|
||||
- [ ] Verify the remote dataset path and environment path.
|
||||
- [ ] Verify GPU availability and reserve 6 GPUs for Phase-1 launches.
|
||||
- [ ] Sync the required code to a dedicated remote suite directory.
|
||||
- [ ] Record exact remote paths back into the local suite manifest.
|
||||
|
||||
### Task 3: Launch the 6 Phase-1 experiments in parallel
|
||||
|
||||
**Files:**
|
||||
- Reuse: `roboimi/demos/vla_scripts/train_vla.py`
|
||||
- Modify only local suite tracking files unless a launch bug is discovered
|
||||
|
||||
- [ ] Launch 6 runs concurrently with fixed settings: IMF, emb=384, layer=12, max_steps=50k.
|
||||
- [ ] Keep all other relevant training hyperparameters aligned to the current strong baseline unless a concrete blocker appears.
|
||||
- [ ] Assign one GPU per run on the 8xL20 host.
|
||||
- [ ] Capture PID, log path, and SwanLab URL for each run in `status.json`.
|
||||
|
||||
### Task 4: Monitor and summarize Phase-1 until all 6 finish
|
||||
|
||||
**Files:**
|
||||
- Update: `experiment_suites/2026-04-04-imf-horizon-grid/status.json`
|
||||
- Update: `experiment_suites/2026-04-04-imf-horizon-grid/notes.md`
|
||||
|
||||
- [ ] Periodically parse each run’s log/checkpoints to extract latest step, latest rollout reward, and best rollout reward so far.
|
||||
- [ ] Keep a resumable local summary so progress can be continued in later turns without rediscovery.
|
||||
- [ ] After all 6 runs finish, rank them by `max avg_reward` and write a compact Phase-1 summary.
|
||||
|
||||
### Task 5: Prepare the Phase-2 visual-attnres ablation
|
||||
|
||||
**Files:**
|
||||
- Likely modify: vision backbone implementation and config files (to be confirmed after code inspection)
|
||||
- Add/update targeted tests for the visual backbone path if code changes are needed
|
||||
|
||||
- [ ] Use the best Phase-1 `(pred_horizon, num_action_steps)` combination as the fixed rollout setting for Phase-2.
|
||||
- [ ] Compare:
|
||||
1. current setup: attnres only in the IMF head
|
||||
2. ablation setup: attnres in both IMF head and visual encoder path
|
||||
- [ ] Keep the rest of the training settings fixed.
|
||||
- [ ] Launch and monitor the Phase-2 pair after Phase-1 summary is complete.
|
||||
@@ -0,0 +1,92 @@
|
||||
# LEWM ViT Backbone Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Replace the current ResNet visual encoder in roboimi VLA training with a frozen LEWM ViT visual backbone (encoder + projector) that consumes the three camera views jointly and outputs one 192-d CLS embedding per timestep, then launch two 50k runs on the 5880 machine.
|
||||
|
||||
**Architecture:** Add a new joint-multiview LEWM backbone that fuses `front/top/r_vis` into one LEWM-style image, reproduces LEWM preprocessing, loads frozen weights from the trained checkpoint, and exposes a `joint_output_dim=192`. Add a minimal `VLAAgent` compatibility branch so conditions can be sized from joint visual dim instead of `output_dim * num_cams`, while leaving the rest of the diffusion pipeline unchanged.
|
||||
|
||||
**Tech Stack:** PyTorch, transformers `ViTModel`, Hydra configs, existing roboimi VLA training/eval scripts, remote SSH/rsync to 100.73.14.65.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing tests for LEWM joint-vision backbone contract
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_lewm_vit_backbone.py`
|
||||
- Modify: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing backbone shape/load test**
|
||||
- [ ] **Step 2: Run `pytest tests/test_lewm_vit_backbone.py -q` and verify it fails**
|
||||
- [ ] **Step 3: Extend `tests/test_imf_vla_agent.py` with a failing joint-output backbone case**
|
||||
- [ ] **Step 4: Run `pytest tests/test_imf_vla_agent.py -q` and verify it fails**
|
||||
|
||||
### Task 2: Implement LEWM joint-multiview frozen backbone
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/models/backbones/lewm_vit_backbone.py`
|
||||
- Modify: `roboimi/vla/models/backbones/__init__.py` only if exports are needed
|
||||
|
||||
- [ ] **Step 1: Create `LEWMViTBackbone` with public attrs `camera_names`, `num_cameras`, `joint_output_dim=192`**
|
||||
- [ ] **Step 2: Reproduce LEWM preprocessing and joint multiview fusion**
|
||||
- [ ] **Step 3: Load checkpoint weights from `model.encoder.*` and `model.projector.*`**
|
||||
- [ ] **Step 4: Freeze encoder/projector and keep them in eval mode via `train()` override**
|
||||
- [ ] **Step 5: Run `pytest tests/test_lewm_vit_backbone.py -q` and verify green**
|
||||
|
||||
### Task 3: Add minimal agent support for joint visual dim
|
||||
|
||||
**Files:**
|
||||
- Modify: `roboimi/vla/agent.py`
|
||||
- Test: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: Add a `joint_output_dim` branch in `VLAAgent.__init__` for `per_step_cond_dim` / `global_cond_dim`**
|
||||
- [ ] **Step 2: Keep `_build_cond()` semantics unchanged except for matching the new dim contract**
|
||||
- [ ] **Step 3: Run `pytest tests/test_imf_vla_agent.py -q` and verify green**
|
||||
|
||||
### Task 4: Add Hydra configs for LEWM backbone training
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/conf/backbone/lewm_vit_diffusion.yaml`
|
||||
- Create: `roboimi/vla/conf/agent/lewm_imf_attnres.yaml`
|
||||
|
||||
- [ ] **Step 1: Add backbone config pointing to the new LEWM backbone**
|
||||
- [ ] **Step 2: Add `agent=lewm_imf_attnres` config with 3 cameras and `head.cond_dim=208`**
|
||||
- [ ] **Step 3: Verify Hydra instantiation with a one-shot compose smoke**
|
||||
|
||||
### Task 5: Verify focused local tests
|
||||
|
||||
**Files:**
|
||||
- Reuse the above
|
||||
|
||||
- [ ] **Step 1: Run `pytest tests/test_lewm_vit_backbone.py tests/test_imf_vla_agent.py tests/test_eval_vla_headless_import.py -q`**
|
||||
- [ ] **Step 2: If needed, run one tiny local import/forward smoke**
|
||||
|
||||
### Task 6: Sync to 5880 and remote smoke with real checkpoint
|
||||
|
||||
**Files:**
|
||||
- Remote target: `/home/droid/roboimi_suite_20260404`
|
||||
|
||||
- [ ] **Step 1: Rsync modified source/config files to `100.73.14.65:/home/droid/roboimi_suite_20260404`**
|
||||
- [ ] **Step 2: Run a 2-step smoke on GPU0 with `agent.head.n_emb=384`, `train.rollout_num_episodes=10`, real LEWM checkpoint**
|
||||
- [ ] **Step 3: Run a 2-step smoke on GPU1 with `agent.head.n_emb=256`, same checkpoint**
|
||||
|
||||
### Task 7: Launch two real 50k runs on the 5880 machine
|
||||
|
||||
**Files:**
|
||||
- Remote logs under `/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/`
|
||||
|
||||
- [ ] **Step 1: Launch embed384/layer12 on GPU0**
|
||||
- [ ] **Step 2: Launch embed256/layer12 on GPU1**
|
||||
- [ ] **Step 3: Ensure both use `data.camera_names=[r_vis,top,front]`, `pred_horizon=16`, `num_action_steps=8`, `train.rollout_num_episodes=10`, `max_steps=50000`**
|
||||
- [ ] **Step 4: Record run names, pids, log paths, SwanLab URLs**
|
||||
|
||||
### Task 8: Update experiment tracking docs and commit
|
||||
|
||||
**Files:**
|
||||
- Create: `experiment_suites/2026-04-05-lewm-vit-transfer/manifest.json`
|
||||
- Create: `experiment_suites/2026-04-05-lewm-vit-transfer/status.json`
|
||||
- Create: `experiment_suites/2026-04-05-lewm-vit-transfer/notes.md`
|
||||
|
||||
- [ ] **Step 1: Record checkpoint path, frozen LEWM design, rollout=10, and both run configs**
|
||||
- [ ] **Step 2: Record running status after launch**
|
||||
- [ ] **Step 3: Commit implementation + docs with a focused message**
|
||||
@@ -0,0 +1,64 @@
|
||||
# Phase-2 Full-AttnRes Vision Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Replace all ResNet residual units in the vision backbone with AttnRes-based image blocks while preserving the current IMF agent interfaces and launch a Phase-2 experiment anchored on the best Phase-1 horizon setting.
|
||||
|
||||
**Architecture:** Keep the current multi-camera encoder shell and per-camera output contract, but introduce a new ResNet-like 2D AttnRes backbone that preserves stage-wise downsampling and final SpatialSoftmax conditioning. Wire it into the existing `ResNetDiffusionBackbone` via an opt-in mode and keep the agent/head/data interfaces unchanged.
|
||||
|
||||
**Tech Stack:** PyTorch, Hydra/OmegaConf, existing IMF AttnRes transformer components, pytest.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing tests for the new full-AttnRes visual backbone
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_attnres_resnet2d_backbone.py`
|
||||
- Update: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: Write a failing backbone shape test**
|
||||
- [ ] **Step 2: Run it to confirm the new backbone/config does not exist yet**
|
||||
- [ ] **Step 3: Add a failing IMF agent wiring test for unchanged cond_dim=208**
|
||||
- [ ] **Step 4: Run the targeted tests and capture the failure**
|
||||
|
||||
### Task 2: Implement a ResNet-like 2D AttnRes backbone
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/models/backbones/attnres_resnet2d.py`
|
||||
- Modify: `roboimi/vla/models/backbones/resnet_diffusion.py`
|
||||
|
||||
- [ ] **Step 1: Add minimal 2D tokenization helpers and positional encoding / bias handling**
|
||||
- [ ] **Step 2: Implement `AttnResImageBlock2D` for feature maps**
|
||||
- [ ] **Step 3: Implement `AttnResResNetLikeBackbone2D` with stage-wise downsampling**
|
||||
- [ ] **Step 4: Wire `_SingleRgbEncoder` to choose between original ResNet trunk and the new full-AttnRes trunk**
|
||||
- [ ] **Step 5: Run the new backbone tests**
|
||||
|
||||
### Task 3: Expose config switches and agent wiring
|
||||
|
||||
**Files:**
|
||||
- Modify: `roboimi/vla/conf/backbone/resnet_diffusion.yaml`
|
||||
- Modify: `roboimi/vla/conf/agent/resnet_imf_attnres.yaml`
|
||||
|
||||
- [ ] **Step 1: Add a backbone mode/config flag for the full-AttnRes vision trunk**
|
||||
- [ ] **Step 2: Add defaults for attnres image depth/heads/etc. if needed**
|
||||
- [ ] **Step 3: Add a Phase-2 launch override path that enables the new visual trunk**
|
||||
- [ ] **Step 4: Run agent wiring tests again**
|
||||
|
||||
### Task 4: Smoke-verify training path
|
||||
|
||||
**Files:**
|
||||
- Reuse existing training scripts and configs
|
||||
|
||||
- [ ] **Step 1: Run a short CPU or tiny-step smoke instantiation / `compute_loss` test**
|
||||
- [ ] **Step 2: If needed, run a very short training smoke launch**
|
||||
- [ ] **Step 3: Verify no cond-dim or rollout-loading regressions**
|
||||
|
||||
### Task 5: Launch the Phase-2 experiment
|
||||
|
||||
**Files:**
|
||||
- Update experiment tracking under `experiment_suites/`
|
||||
|
||||
- [ ] **Step 1: Use Phase-1 best setting (`pred_horizon=16`, `num_action_steps=8`)**
|
||||
- [ ] **Step 2: Launch baseline reference or reuse existing result**
|
||||
- [ ] **Step 3: Launch full-AttnRes vision experiment**
|
||||
- [ ] **Step 4: Track rollout metrics and compare max avg_reward**
|
||||
81
docs/superpowers/plans/2026-04-06-resnet-multitoken-imf.md
Normal file
81
docs/superpowers/plans/2026-04-06-resnet-multitoken-imf.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# ResNet Multitoken IMF Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Implement a standard-ResNet-18 multiview IMF variant that emits three condition tokens per obs step and launch four L20 experiments for `n_emb in {256,384}` and `n_layer in {12,16}`.
|
||||
|
||||
**Architecture:** The ResNet backbone will optionally return one token per camera instead of concatenating all cameras into one token. `VLAAgent` will pair each camera token with the current state, project each pair into a condition token, flatten the per-step camera tokens into one cond sequence, and feed that sequence into the existing IMF/AttnRes head.
|
||||
|
||||
**Tech Stack:** PyTorch, torchvision ResNet-18, Hydra, pytest, SwanLab, SSH/Tailscale.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing tests for multi-token conditioning
|
||||
|
||||
**Files:**
|
||||
- Modify: `tests/test_imf_vla_agent.py`
|
||||
- Modify: `tests/test_resnet_transformer_agent_wiring.py`
|
||||
|
||||
- [ ] **Step 1: Add a direct agent test**
|
||||
- Stub a vision backbone returning `(B,T,3,D)` and assert `_build_cond()` yields `(B, T*3, D_cond)`.
|
||||
- Assert state is paired with each camera token, not concatenated across cameras first.
|
||||
- [ ] **Step 2: Add Hydra wiring test**
|
||||
- Instantiate a new `agent=resnet_imf_attnres_multitoken` config with small dims.
|
||||
- Assert `condition_tokens_per_step == 3`, `condition_sequence_length == obs_horizon * 3`, and head `n_obs_steps` receives that sequence length.
|
||||
- [ ] **Step 3: Run focused tests and verify RED**
|
||||
- `python -m pytest tests/test_imf_vla_agent.py tests/test_resnet_transformer_agent_wiring.py -q`
|
||||
|
||||
### Task 2: Implement multi-token ResNet conditioning path
|
||||
|
||||
**Files:**
|
||||
- Modify: `roboimi/vla/models/backbones/resnet_diffusion.py`
|
||||
- Modify: `roboimi/vla/agent.py`
|
||||
- Create: `roboimi/vla/conf/agent/resnet_imf_attnres_multitoken.yaml`
|
||||
|
||||
- [ ] **Step 1: Extend ResNet backbone**
|
||||
- Add an opt-in flag to return `(B,T,num_cams,D)` camera tokens instead of one concatenated `(B,T,num_cams*D)` token.
|
||||
- Keep standard ResNet-18 vision mode; do not switch to AttnRes vision.
|
||||
- [ ] **Step 2: Extend VLAAgent condition building**
|
||||
- Support visual features with rank 4 `(B,T,K,D)`.
|
||||
- Broadcast state to `(B,T,K,D_state)`, concatenate per camera, apply projector per token, then flatten to `(B,T*K,D_cond)`.
|
||||
- Track `condition_tokens_per_step` and `condition_sequence_length`.
|
||||
- [ ] **Step 3: Update transformer-head instantiation**
|
||||
- Pass `n_obs_steps=condition_sequence_length` when building transformer heads.
|
||||
- [ ] **Step 4: Add Hydra config**
|
||||
- New agent config uses:
|
||||
- separate ResNet-18 per camera
|
||||
- standard residual vision trunk (`vision_backbone_mode=resnet`)
|
||||
- condition projector output dim tied to `${agent.head.n_emb}`
|
||||
- rollout episodes `10`, `pred_horizon=16`, `num_action_steps=8`
|
||||
|
||||
### Task 3: Verify locally
|
||||
|
||||
**Files:**
|
||||
- Modify only if verification reveals issues
|
||||
|
||||
- [ ] **Step 1: Run focused tests and make them pass**
|
||||
- `python -m pytest tests/test_imf_vla_agent.py tests/test_resnet_transformer_agent_wiring.py -q`
|
||||
- [ ] **Step 2: Run regression subset**
|
||||
- `python -m pytest tests/test_eval_vla_headless.py tests/test_train_vla_rollout_validation.py tests/test_simple_robot_dataset_image_loading.py -q`
|
||||
- [ ] **Step 3: Run local smoke instantiation**
|
||||
- instantiate the new Hydra config and verify cond shape / sequence length
|
||||
|
||||
### Task 4: Launch 4 L20 experiments
|
||||
|
||||
**Files:**
|
||||
- Remote repo copy under `/home/droid/roboimi_suite_20260404`
|
||||
|
||||
- [ ] **Step 1: Sync code to `100.119.99.14`**
|
||||
- [ ] **Step 2: Smoke the new config on remote**
|
||||
- [ ] **Step 3: Launch runs**
|
||||
- `(n_emb=256, n_layer=12)`
|
||||
- `(n_emb=256, n_layer=16)`
|
||||
- `(n_emb=384, n_layer=12)`
|
||||
- `(n_emb=384, n_layer=16)`
|
||||
- [ ] **Step 4: Keep fixed across runs**
|
||||
- rollout episodes `10`
|
||||
- `pred_horizon=16`
|
||||
- `num_action_steps=8`
|
||||
- standard ResNet-18 vision trunk
|
||||
- three separate camera weights
|
||||
- [ ] **Step 5: Record PIDs, GPUs, log paths, SwanLab URLs**
|
||||
78
docs/superpowers/plans/2026-04-06-siglip2-multiview-vla.md
Normal file
78
docs/superpowers/plans/2026-04-06-siglip2-multiview-vla.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# SigLIP2 Multiview VLA Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Integrate a frozen shared SigLIP2 multiview encoder into the IMF/AttnRes policy, preserve raw-256 image handling, and launch two 50k-step experiments on the 5880 host with per-view projection dims 96 and 192.
|
||||
|
||||
**Architecture:** A new backbone will independently encode each camera view with SigLIP2 and project each 768-d pooled feature to a configurable per-view dimension. `VLAAgent` will concatenate visual features with robot state, then optionally project the combined per-step condition to the head's required 384-d interface before diffusion training/inference.
|
||||
|
||||
**Tech Stack:** PyTorch, transformers SigLIP2, Hydra, pytest, SSH/Tailscale, SwanLab.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing tests for SigLIP2 backbone and projected conditioning
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_siglip2_diffusion_backbone.py`
|
||||
- Modify: `tests/test_imf_vla_agent.py`
|
||||
|
||||
- [ ] **Step 1: Write failing backbone tests**
|
||||
- Instantiate the new backbone with a stub SigLIP2 vision model.
|
||||
- Assert raw dataset resize is `None`, eval resize is `(256, 256)`, output shape is `(B, T, 3 * per_view_output_dim)`.
|
||||
- Assert three views are encoded independently and projected.
|
||||
- [ ] **Step 2: Run focused tests and verify RED**
|
||||
- Run `pytest tests/test_siglip2_diffusion_backbone.py tests/test_imf_vla_agent.py -q`
|
||||
- Expect failure because the backbone/config/projector do not exist yet.
|
||||
- [ ] **Step 3: Extend agent wiring tests**
|
||||
- Add a Hydra/instantiate test for a new SigLIP2 IMF config.
|
||||
- Assert raw condition dim `3 * per_view_output_dim + obs_dim`, projected cond dim `384`, and head `cond_dim == 384`.
|
||||
|
||||
### Task 2: Implement SigLIP2 backbone and optional condition projector
|
||||
|
||||
**Files:**
|
||||
- Create: `roboimi/vla/models/backbones/siglip2_diffusion_backbone.py`
|
||||
- Create: `roboimi/vla/conf/backbone/siglip2_diffusion.yaml`
|
||||
- Create: `roboimi/vla/conf/agent/siglip2_imf_attnres.yaml`
|
||||
- Create: `roboimi/vla/conf/modules/linear_condition_projector.yaml`
|
||||
- Modify: `roboimi/vla/models/backbones/__init__.py`
|
||||
- Modify: `roboimi/vla/agent.py`
|
||||
|
||||
- [ ] **Step 1: Implement backbone**
|
||||
- Load `SiglipVisionModel.from_pretrained("google/siglip2-base-patch16-256")`.
|
||||
- Normalize `[0,1]` pixels with mean/std `0.5` and encode each view independently.
|
||||
- Project each 768-d pooled feature to configurable per-view dim and concatenate across cameras.
|
||||
- [ ] **Step 2: Implement optional condition projector**
|
||||
- Allow `VLAAgent` to accept `cond_projector`.
|
||||
- Track `raw_per_step_cond_dim` and projected `per_step_cond_dim` / `global_cond_dim`.
|
||||
- Apply the projector in `_build_cond()` after visual+state concatenation.
|
||||
- [ ] **Step 3: Add Hydra configs**
|
||||
- New agent config should default to `n_emb=384`, `n_layer=12`, `pred_horizon=16`, `num_action_steps=8`, `head.cond_dim=384`.
|
||||
- Backbone config should set `dataset_image_resize_shape: null` and `eval_image_resize_shape: [256, 256]`.
|
||||
|
||||
### Task 3: Verify locally and prepare remote execution
|
||||
|
||||
**Files:**
|
||||
- Modify as needed only if tests/smoke reveal issues
|
||||
|
||||
- [ ] **Step 1: Run focused tests and make them pass**
|
||||
- `pytest tests/test_siglip2_diffusion_backbone.py tests/test_imf_vla_agent.py tests/test_eval_vla_headless.py tests/test_train_vla_rollout_validation.py tests/test_simple_robot_dataset_image_loading.py -q`
|
||||
- [ ] **Step 2: Run a local smoke instantiation**
|
||||
- Instantiate the new Hydra config with stubbed optional modules or offline-safe monkeypatching.
|
||||
- [ ] **Step 3: Review diffs for unintended LEWM/raw256 regressions**
|
||||
|
||||
### Task 4: Sync to 5880 and launch experiments
|
||||
|
||||
**Files:**
|
||||
- Remote repo copy under `/home/droid/roboimi_suite_20260404`
|
||||
|
||||
- [ ] **Step 1: Stop superseded remote jobs**
|
||||
- [ ] **Step 2: Sync updated code to remote**
|
||||
- Prefer `rsync` or `git push/pull` without overwriting unrelated files.
|
||||
- [ ] **Step 3: Remote smoke test**
|
||||
- Confirm SigLIP2 model download/import works in `/home/droid/miniforge3/envs/roboimi/bin/python`.
|
||||
- Confirm headless rollout path still uses `256x256` eval resize.
|
||||
- [ ] **Step 4: Launch experiment A**
|
||||
- `per_view_output_dim=96`, `embed=384`, `layer=12`, `pred=16`, `exec=8`, `steps=50000`.
|
||||
- [ ] **Step 5: Launch experiment B**
|
||||
- `per_view_output_dim=192`, same other hyperparameters.
|
||||
- [ ] **Step 6: Record PIDs, GPUs, log paths, and SwanLab run URLs.**
|
||||
Reference in New Issue
Block a user