Phase-1 Final Report and Phase-2 Handoff

Finalized: 2026-04-05 00:34:20 CST
Scope: IMF AttnRes policy horizon/action-step grid on sim_transfer
Fixed setup: n_emb=384, n_layer=12, batch size 80, learning rate 2.5e-4, max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras [r_vis, top, front].
Main metric: checkpoints/vla_model_best.pt 中记录的训练期最大 rollout_avg_reward。

Final leaderboard

Rank	Run ID	pred_horizon	executed action steps	Best avg_reward	Best step	Final loss
1	`ph16_ex8`	16	8	610.8	21874	0.0034
2	`ph16_ex16`	16	16	561.2	48124	0.0045
3	`ph32_ex32`	32	32	513.2	43749	0.0040
4	`ph8_ex8`	8	8	415.6	48124	0.0070
5	`ph32_ex8`	32	8	361.6	43749	0.0048
6	`ph32_ex16`	32	16	239.6	48124	0.0038

最佳组合是 pred_horizon=16 + num_action_steps=8，最佳平均奖励为 610.8，出现在 step 21874。
在 pred_horizon=16 下，执行 8 步优于执行 16 步，优势约 +8.8%（610.8 vs 561.2）。
pred_horizon=32 时，对执行步长非常敏感：32/32 明显优于 32/8 和 32/16；特别是 32/16 退化最明显。
更长的预测窗口并不会自动带来更高 reward；预测窗口与实际执行窗口的匹配关系 是关键。
最佳 checkpoint 并不在训练结束时出现，而是在 50k 训练中较早的 21.9k step 出现，说明 rollout 验证比仅看 train loss 更重要。
因而 Phase-2 的比较基线固定为 ph16_ex8。

Baseline run: ph16_ex8
Baseline best checkpoint: step 21874
Baseline best avg_reward: 610.8
Baseline run dir: /home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223

本阶段按你的要求，不再只是 IMF head 中使用 AttnRes，而是把之前视觉 ResNet 主干中的残差单元全部替换为 AttnRes 残差单元。当前实现保留了 ResNet 风格的 stage / downsample 宏观结构，但视觉残差 trunk 已切换到 AttnRes：

Run: imf-p2-full-attnres-vision-ph16-ex08-emb384-l12-b40-lr1p25e4-ms50k-l20g3-20260405-002424
Host: 100.119.99.14, GPU 3
Config anchor: pred_horizon=16, num_action_steps=8
Vision backbone: attnres_resnet
Because batch size 80 OOMed on both local 5090 and remote L20, Phase-2 currently uses:
- batch size: 40
- learning rate: 1.25e-4
Latest confirmed progress: step 1300
First rollout has not happened yet at this observation point.
SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/xy7fjdmn0stdr19eu3gub

继续监控 Phase-2 full-AttnRes 训练，待其完成后直接与 Phase-1 baseline 610.8 做对比，判断“视觉主干全部替换为 AttnRes”是否优于“仅 IMF 中使用 AttnRes”。