Files
roboimi/experiment_suites/2026-04-04-imf-horizon-grid/final_report.md

3.3 KiB
Raw Blame History

Phase-1 Final Report and Phase-2 Handoff

  • Finalized: 2026-04-05 00:34:20 CST
  • Scope: IMF AttnRes policy horizon/action-step grid on sim_transfer
  • Fixed setup: n_emb=384, n_layer=12, batch size 80, learning rate 2.5e-4, max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras [r_vis, top, front].
  • Main metric: checkpoints/vla_model_best.pt 中记录的训练期最大 rollout_avg_reward

Final leaderboard

Rank Run ID pred_horizon executed action steps Best avg_reward Best step Final loss
1 ph16_ex8 16 8 610.8 21874 0.0034
2 ph16_ex16 16 16 561.2 48124 0.0045
3 ph32_ex32 32 32 513.2 43749 0.0040
4 ph8_ex8 8 8 415.6 48124 0.0070
5 ph32_ex8 32 8 361.6 43749 0.0048
6 ph32_ex16 32 16 239.6 48124 0.0038

Final conclusions

  1. 最佳组合是 pred_horizon=16 + num_action_steps=8,最佳平均奖励为 610.8,出现在 step 21874
  2. pred_horizon=16 下,执行 8 步优于执行 16 步,优势约 +8.8%610.8 vs 561.2)。
  3. pred_horizon=32 时,对执行步长非常敏感:32/32 明显优于 32/832/16;特别是 32/16 退化最明显。
  4. 更长的预测窗口并不会自动带来更高 reward预测窗口与实际执行窗口的匹配关系 是关键。
  5. 最佳 checkpoint 并不在训练结束时出现,而是在 50k 训练中较早的 21.9k step 出现,说明 rollout 验证比仅看 train loss 更重要。
  6. 因而 Phase-2 的比较基线固定为 ph16_ex8
  • Baseline run: ph16_ex8
  • Baseline best checkpoint: step 21874
  • Baseline best avg_reward: 610.8
  • Baseline run dir: /home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223

Phase-2 target: full-AttnRes vision backbone

本阶段按你的要求,不再只是 IMF head 中使用 AttnRes而是把之前视觉 ResNet 主干中的残差单元全部替换为 AttnRes 残差单元。当前实现保留了 ResNet 风格的 stage / downsample 宏观结构,但视觉残差 trunk 已切换到 AttnRes

  • implementation: roboimi/vla/models/backbones/attnres_resnet2d.py
  • wiring: roboimi/vla/models/backbones/resnet_diffusion.py
  • config: roboimi/vla/conf/backbone/resnet_diffusion.yaml

相关代码已提交:

  • a780068 — headless rollout 修复 + Phase-1 汇总
  • 2033169 — full-AttnRes vision backbone

Phase-2 launch status (observed on 2026-04-05 00:36 CST)

  • Run: imf-p2-full-attnres-vision-ph16-ex08-emb384-l12-b40-lr1p25e4-ms50k-l20g3-20260405-002424
  • Host: 100.119.99.14, GPU 3
  • Config anchor: pred_horizon=16, num_action_steps=8
  • Vision backbone: attnres_resnet
  • Because batch size 80 OOMed on both local 5090 and remote L20, Phase-2 currently uses:
    • batch size: 40
    • learning rate: 1.25e-4
  • Latest confirmed progress: step 1300
  • First rollout has not happened yet at this observation point.
  • SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/xy7fjdmn0stdr19eu3gub

Next action

继续监控 Phase-2 full-AttnRes 训练,待其完成后直接与 Phase-1 baseline 610.8 做对比,判断“视觉主干全部替换为 AttnRes”是否优于“仅 IMF 中使用 AttnRes”。