Files
roboimi/experiment_suites/2026-04-04-imf-horizon-grid/phase1_summary.md

3.0 KiB

Phase-1 IMF Horizon Grid Summary

  • Generated: 2026-04-04 23:43:38
  • Fixed baseline: IMF AttnRes head, n_emb=384, n_layer=12, batch_size=80, lr=2.5e-4, max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras [r_vis, top, front].
  • Primary metric: checkpoints/vla_model_best.pt -> rollout_avg_reward (max training-time rollout average reward).

Ranked results

Rank Run ID pred_horizon num_action_steps Best avg_reward Best step Final loss Host
1 ph16_ex8 16 8 610.8 21874 0.0034 100.73.14.65
2 ph16_ex16 16 16 561.2 48124 0.0045 100.119.99.14
3 ph32_ex32 32 32 513.2 43749 0.0040 local
4 ph8_ex8 8 8 415.6 48124 0.0070 100.73.14.65
5 ph32_ex8 32 8 361.6 43749 0.0048 100.119.99.14
6 ph32_ex16 32 16 239.6 48124 0.0038 100.119.99.14

Main observations

  • Best overall setting was pred_horizon=16, num_action_steps=8 with max avg_reward = 610.8 at step 21874.
  • Comparing horizon 16: executing 8 steps outperformed executing 16 steps (ph16_ex8 > ph16_ex16).
  • Comparing horizon 32: executing the full 32-step chunk was much better than executing 16 or 8 steps (ph32_ex32 > ph32_ex8 > ph32_ex16).
  • Short horizon 8 with 8-step execution was competitive but clearly below the best 16/8 and 32/32 settings.
  • In this sweep, increasing prediction horizon helped only when the executed chunk length matched a good control cadence; mismatch could hurt a lot (especially ph32_ex16).

Raw results

  • ph16_ex8: best avg_reward=610.8 @ step 21874, final_loss=0.0034, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223
  • ph16_ex16: best avg_reward=561.2 @ step 48124, final_loss=0.0045, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex16-emb384-l12-ms50k-l20g0-20260404-131223
  • ph32_ex32: best avg_reward=513.2 @ step 43749, final_loss=0.0040, run_dir=/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/imf-p1-ph32-ex32-emb384-l12-ms50k-5090-20260404-131223
  • ph8_ex8: best avg_reward=415.6 @ step 48124, final_loss=0.0070, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph08-ex08-emb384-l12-ms50k-5880g0-20260404-131223
  • ph32_ex8: best avg_reward=361.6 @ step 43749, final_loss=0.0048, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex08-emb384-l12-ms50k-l20g1-20260404-131223
  • ph32_ex16: best avg_reward=239.6 @ step 48124, final_loss=0.0038, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex16-emb384-l12-ms50k-l20g2-20260404-131223

Recommendation for Phase-2 anchor

  • Use pred_horizon=16, num_action_steps=8 as the strongest Phase-1 baseline if the goal is purely maximizing rollout reward.
  • If phase-2 needs a more conservative action execution budget, ph16_ex8 is the strongest non-full-32 execution setting and may still be a good comparison anchor.