Files

1.1 KiB

PLAN

Goal

Train a 50k-step IMF baseline with the original ResNet vision backbone (no full-AttnRes vision replacement), using only top and front cameras as image conditioning.

Fixed comparison contract

  • Agent: resnet_imf_attnres
  • Vision backbone mode: resnet
  • pred_horizon=16
  • num_action_steps=8
  • n_emb=384, n_layer=12, n_head=1, n_kv_head=1
  • inference_steps=1
  • batch_size=80, lr=2.5e-4, cosine scheduler, warmup 2000
  • dataset: /home/droid/project/diana_sim/sim_transfer
  • cameras: [top, front] only
  • training budget: max_steps=50000
  • rollout validation: every 5 epochs, 5 episodes, headless

Resource plan

  • Host: local
  • GPU: RTX 5090 (GPU 0)

Execution path

  1. Run a short 2-step smoke test on GPU with the exact 2-camera config.
  2. If smoke passes, launch the 50k main run with durable log redirection.
  3. Record run name, pid, log path, and SwanLab URL into suite status.

Fallbacks

  • If batch 80 OOMs, fall back to batch 64 with scaled lr 2.0e-4.
  • If dataloader startup is unstable, reduce num_workers from 12 to 8.