3.0 KiB
3.0 KiB
Phase-1 IMF Horizon Grid Summary
- Generated: 2026-04-04 23:43:38
- Fixed baseline: IMF AttnRes head, n_emb=384, n_layer=12, batch_size=80, lr=2.5e-4, max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras
[r_vis, top, front]. - Primary metric:
checkpoints/vla_model_best.pt -> rollout_avg_reward(max training-time rollout average reward).
Ranked results
| Rank | Run ID | pred_horizon | num_action_steps | Best avg_reward | Best step | Final loss | Host |
|---|---|---|---|---|---|---|---|
| 1 | ph16_ex8 |
16 | 8 | 610.8 | 21874 | 0.0034 | 100.73.14.65 |
| 2 | ph16_ex16 |
16 | 16 | 561.2 | 48124 | 0.0045 | 100.119.99.14 |
| 3 | ph32_ex32 |
32 | 32 | 513.2 | 43749 | 0.0040 | local |
| 4 | ph8_ex8 |
8 | 8 | 415.6 | 48124 | 0.0070 | 100.73.14.65 |
| 5 | ph32_ex8 |
32 | 8 | 361.6 | 43749 | 0.0048 | 100.119.99.14 |
| 6 | ph32_ex16 |
32 | 16 | 239.6 | 48124 | 0.0038 | 100.119.99.14 |
Main observations
- Best overall setting was
pred_horizon=16,num_action_steps=8with max avg_reward = 610.8 at step 21874. - Comparing horizon 16: executing 8 steps outperformed executing 16 steps (
ph16_ex8>ph16_ex16). - Comparing horizon 32: executing the full 32-step chunk was much better than executing 16 or 8 steps (
ph32_ex32>ph32_ex8>ph32_ex16). - Short horizon 8 with 8-step execution was competitive but clearly below the best 16/8 and 32/32 settings.
- In this sweep, increasing prediction horizon helped only when the executed chunk length matched a good control cadence; mismatch could hurt a lot (especially
ph32_ex16).
Raw results
ph16_ex8: best avg_reward=610.8 @ step 21874, final_loss=0.0034, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223ph16_ex16: best avg_reward=561.2 @ step 48124, final_loss=0.0045, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex16-emb384-l12-ms50k-l20g0-20260404-131223ph32_ex32: best avg_reward=513.2 @ step 43749, final_loss=0.0040, run_dir=/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/imf-p1-ph32-ex32-emb384-l12-ms50k-5090-20260404-131223ph8_ex8: best avg_reward=415.6 @ step 48124, final_loss=0.0070, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph08-ex08-emb384-l12-ms50k-5880g0-20260404-131223ph32_ex8: best avg_reward=361.6 @ step 43749, final_loss=0.0048, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex08-emb384-l12-ms50k-l20g1-20260404-131223ph32_ex16: best avg_reward=239.6 @ step 48124, final_loss=0.0038, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex16-emb384-l12-ms50k-l20g2-20260404-131223
Recommendation for Phase-2 anchor
- Use
pred_horizon=16,num_action_steps=8as the strongest Phase-1 baseline if the goal is purely maximizing rollout reward. - If phase-2 needs a more conservative action execution budget,
ph16_ex8is the strongest non-full-32 execution setting and may still be a good comparison anchor.