# Phase-1 Final Report and Phase-2 Handoff - Finalized: 2026-04-05 00:34:20 CST - Scope: IMF AttnRes policy horizon/action-step grid on `sim_transfer` - Fixed setup: `n_emb=384`, `n_layer=12`, batch size `80`, learning rate `2.5e-4`, `max_steps=50k`, rollout every 5 epochs with 5 episodes, 3 cameras `[r_vis, top, front]`. - Main metric: `checkpoints/vla_model_best.pt` 中记录的训练期最大 `rollout_avg_reward`。 ## Final leaderboard | Rank | Run ID | pred_horizon | executed action steps | Best avg_reward | Best step | Final loss | |---:|---|---:|---:|---:|---:|---:| | 1 | `ph16_ex8` | 16 | 8 | **610.8** | 21874 | 0.0034 | | 2 | `ph16_ex16` | 16 | 16 | 561.2 | 48124 | 0.0045 | | 3 | `ph32_ex32` | 32 | 32 | 513.2 | 43749 | 0.0040 | | 4 | `ph8_ex8` | 8 | 8 | 415.6 | 48124 | 0.0070 | | 5 | `ph32_ex8` | 32 | 8 | 361.6 | 43749 | 0.0048 | | 6 | `ph32_ex16` | 32 | 16 | 239.6 | 48124 | 0.0038 | ## Final conclusions 1. **最佳组合是 `pred_horizon=16` + `num_action_steps=8`**,最佳平均奖励为 **610.8**,出现在 **step 21874**。 2. 在 `pred_horizon=16` 下,执行 8 步优于执行 16 步,优势约 **+8.8%**(610.8 vs 561.2)。 3. `pred_horizon=32` 时,对执行步长非常敏感:`32/32` 明显优于 `32/8` 和 `32/16`;特别是 `32/16` 退化最明显。 4. 更长的预测窗口并不会自动带来更高 reward;**预测窗口与实际执行窗口的匹配关系** 是关键。 5. 最佳 checkpoint 并不在训练结束时出现,而是在 50k 训练中较早的 **21.9k step** 出现,说明 rollout 验证比仅看 train loss 更重要。 6. 因而 Phase-2 的比较基线固定为 **`ph16_ex8`**。 ## Recommended baseline for follow-up experiments - Baseline run: `ph16_ex8` - Baseline best checkpoint: `step 21874` - Baseline best avg_reward: `610.8` - Baseline run dir: `/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223` ## Phase-2 target: full-AttnRes vision backbone 本阶段按你的要求,不再只是 IMF head 中使用 AttnRes,而是把**之前视觉 ResNet 主干中的残差单元全部替换为 AttnRes 残差单元**。当前实现保留了 ResNet 风格的 stage / downsample 宏观结构,但视觉残差 trunk 已切换到 AttnRes: - implementation: `roboimi/vla/models/backbones/attnres_resnet2d.py` - wiring: `roboimi/vla/models/backbones/resnet_diffusion.py` - config: `roboimi/vla/conf/backbone/resnet_diffusion.yaml` 相关代码已提交: - `a780068` — headless rollout 修复 + Phase-1 汇总 - `2033169` — full-AttnRes vision backbone ## Phase-2 launch status (observed on 2026-04-05 00:36 CST) - Run: `imf-p2-full-attnres-vision-ph16-ex08-emb384-l12-b40-lr1p25e4-ms50k-l20g3-20260405-002424` - Host: `100.119.99.14`, GPU `3` - Config anchor: `pred_horizon=16`, `num_action_steps=8` - Vision backbone: `attnres_resnet` - Because batch size `80` OOMed on both local 5090 and remote L20, Phase-2 currently uses: - batch size: `40` - learning rate: `1.25e-4` - Latest confirmed progress: **step 1300** - First rollout has **not happened yet** at this observation point. - SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/xy7fjdmn0stdr19eu3gub ## Next action 继续监控 Phase-2 full-AttnRes 训练,待其完成后直接与 Phase-1 baseline `610.8` 做对比,判断“视觉主干全部替换为 AttnRes”是否优于“仅 IMF 中使用 AttnRes”。