3.3 KiB
3.3 KiB
Phase-1 Final Report and Phase-2 Handoff
- Finalized: 2026-04-05 00:34:20 CST
- Scope: IMF AttnRes policy horizon/action-step grid on
sim_transfer - Fixed setup:
n_emb=384,n_layer=12, batch size80, learning rate2.5e-4,max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras[r_vis, top, front]. - Main metric:
checkpoints/vla_model_best.pt中记录的训练期最大rollout_avg_reward。
Final leaderboard
| Rank | Run ID | pred_horizon | executed action steps | Best avg_reward | Best step | Final loss |
|---|---|---|---|---|---|---|
| 1 | ph16_ex8 |
16 | 8 | 610.8 | 21874 | 0.0034 |
| 2 | ph16_ex16 |
16 | 16 | 561.2 | 48124 | 0.0045 |
| 3 | ph32_ex32 |
32 | 32 | 513.2 | 43749 | 0.0040 |
| 4 | ph8_ex8 |
8 | 8 | 415.6 | 48124 | 0.0070 |
| 5 | ph32_ex8 |
32 | 8 | 361.6 | 43749 | 0.0048 |
| 6 | ph32_ex16 |
32 | 16 | 239.6 | 48124 | 0.0038 |
Final conclusions
- 最佳组合是
pred_horizon=16+num_action_steps=8,最佳平均奖励为 610.8,出现在 step 21874。 - 在
pred_horizon=16下,执行 8 步优于执行 16 步,优势约 +8.8%(610.8 vs 561.2)。 pred_horizon=32时,对执行步长非常敏感:32/32明显优于32/8和32/16;特别是32/16退化最明显。- 更长的预测窗口并不会自动带来更高 reward;预测窗口与实际执行窗口的匹配关系 是关键。
- 最佳 checkpoint 并不在训练结束时出现,而是在 50k 训练中较早的 21.9k step 出现,说明 rollout 验证比仅看 train loss 更重要。
- 因而 Phase-2 的比较基线固定为
ph16_ex8。
Recommended baseline for follow-up experiments
- Baseline run:
ph16_ex8 - Baseline best checkpoint:
step 21874 - Baseline best avg_reward:
610.8 - Baseline run dir:
/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223
Phase-2 target: full-AttnRes vision backbone
本阶段按你的要求,不再只是 IMF head 中使用 AttnRes,而是把之前视觉 ResNet 主干中的残差单元全部替换为 AttnRes 残差单元。当前实现保留了 ResNet 风格的 stage / downsample 宏观结构,但视觉残差 trunk 已切换到 AttnRes:
- implementation:
roboimi/vla/models/backbones/attnres_resnet2d.py - wiring:
roboimi/vla/models/backbones/resnet_diffusion.py - config:
roboimi/vla/conf/backbone/resnet_diffusion.yaml
相关代码已提交:
a780068— headless rollout 修复 + Phase-1 汇总2033169— full-AttnRes vision backbone
Phase-2 launch status (observed on 2026-04-05 00:36 CST)
- Run:
imf-p2-full-attnres-vision-ph16-ex08-emb384-l12-b40-lr1p25e4-ms50k-l20g3-20260405-002424 - Host:
100.119.99.14, GPU3 - Config anchor:
pred_horizon=16,num_action_steps=8 - Vision backbone:
attnres_resnet - Because batch size
80OOMed on both local 5090 and remote L20, Phase-2 currently uses:- batch size:
40 - learning rate:
1.25e-4
- batch size:
- Latest confirmed progress: step 1300
- First rollout has not happened yet at this observation point.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/xy7fjdmn0stdr19eu3gub
Next action
继续监控 Phase-2 full-AttnRes 训练,待其完成后直接与 Phase-1 baseline 610.8 做对比,判断“视觉主干全部替换为 AttnRes”是否优于“仅 IMF 中使用 AttnRes”。