# PLAN ## Goal Train a 50k-step IMF baseline with the original ResNet vision backbone (no full-AttnRes vision replacement), using only `top` and `front` cameras as image conditioning. ## Fixed comparison contract - Agent: `resnet_imf_attnres` - Vision backbone mode: `resnet` - `pred_horizon=16` - `num_action_steps=8` - `n_emb=384`, `n_layer=12`, `n_head=1`, `n_kv_head=1` - `inference_steps=1` - `batch_size=80`, `lr=2.5e-4`, cosine scheduler, warmup 2000 - dataset: `/home/droid/project/diana_sim/sim_transfer` - cameras: `[top, front]` only - training budget: `max_steps=50000` - rollout validation: every 5 epochs, 5 episodes, headless ## Resource plan - Host: local - GPU: RTX 5090 (GPU 0) ## Execution path 1. Run a short 2-step smoke test on GPU with the exact 2-camera config. 2. If smoke passes, launch the 50k main run with durable log redirection. 3. Record run name, pid, log path, and SwanLab URL into suite status. ## Fallbacks - If batch 80 OOMs, fall back to batch 64 with scaled lr 2.0e-4. - If dataloader startup is unstable, reduce num_workers from 12 to 8.