Phase-1 IMF Horizon Grid Summary

Generated: 2026-04-04 23:43:38
Fixed baseline: IMF AttnRes head, n_emb=384, n_layer=12, batch_size=80, lr=2.5e-4, max_steps=50k, rollout every 5 epochs with 5 episodes, 3 cameras [r_vis, top, front].
Primary metric: checkpoints/vla_model_best.pt -> rollout_avg_reward (max training-time rollout average reward).

Ranked results

Rank	Run ID	pred_horizon	num_action_steps	Best avg_reward	Best step	Final loss	Host
1	`ph16_ex8`	16	8	610.8	21874	0.0034	100.73.14.65
2	`ph16_ex16`	16	16	561.2	48124	0.0045	100.119.99.14
3	`ph32_ex32`	32	32	513.2	43749	0.0040	local
4	`ph8_ex8`	8	8	415.6	48124	0.0070	100.73.14.65
5	`ph32_ex8`	32	8	361.6	43749	0.0048	100.119.99.14
6	`ph32_ex16`	32	16	239.6	48124	0.0038	100.119.99.14

Best overall setting was pred_horizon=16, num_action_steps=8 with max avg_reward = 610.8 at step 21874.
Comparing horizon 16: executing 8 steps outperformed executing 16 steps (ph16_ex8 > ph16_ex16).
Comparing horizon 32: executing the full 32-step chunk was much better than executing 16 or 8 steps (ph32_ex32 > ph32_ex8 > ph32_ex16).
Short horizon 8 with 8-step execution was competitive but clearly below the best 16/8 and 32/32 settings.
In this sweep, increasing prediction horizon helped only when the executed chunk length matched a good control cadence; mismatch could hurt a lot (especially ph32_ex16).

ph16_ex8: best avg_reward=610.8 @ step 21874, final_loss=0.0034, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223
ph16_ex16: best avg_reward=561.2 @ step 48124, final_loss=0.0045, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph16-ex16-emb384-l12-ms50k-l20g0-20260404-131223
ph32_ex32: best avg_reward=513.2 @ step 43749, final_loss=0.0040, run_dir=/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/imf-p1-ph32-ex32-emb384-l12-ms50k-5090-20260404-131223
ph8_ex8: best avg_reward=415.6 @ step 48124, final_loss=0.0070, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph08-ex08-emb384-l12-ms50k-5880g0-20260404-131223
ph32_ex8: best avg_reward=361.6 @ step 43749, final_loss=0.0048, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex08-emb384-l12-ms50k-l20g1-20260404-131223
ph32_ex16: best avg_reward=239.6 @ step 48124, final_loss=0.0038, run_dir=/home/droid/roboimi_suite_20260404/runs/imf-p1-ph32-ex16-emb384-l12-ms50k-l20g2-20260404-131223

Use pred_horizon=16, num_action_steps=8 as the strongest Phase-1 baseline if the goal is purely maximizing rollout reward.
If phase-2 needs a more conservative action execution budget, ph16_ex8 is the strongest non-full-32 execution setting and may still be a good comparison anchor.