4.1 KiB
IMF Rollout Trajectory Images and Short-Horizon Training Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add training-time rollout front trajectory image export plus SwanLab image logging, then start a new local IMF training run with emb=384, layer=12, pred_horizon=8, num_action_steps=4, max_steps=50000.
Architecture: Extend eval_vla.py so a rollout can emit one per-episode static front-view image with red EE trajectory overlay. Extend train_vla.py so rollout validation forces image export, forces video off, and uploads those per-episode images to SwanLab. Launch the requested new run through explicit command-line overrides rather than branch-default config changes.
Tech Stack: Python, PyTorch, Hydra/OmegaConf, MuJoCo, OpenCV, SwanLab.
Task 1: Add and validate rollout image tests
Files:
-
Modify:
tests/test_eval_vla_rollout_artifacts.py -
Modify:
tests/test_train_vla_swanlab_logging.py -
Modify:
tests/test_train_vla_rollout_validation.py -
Add/adjust eval tests so they assert per-episode trajectory image paths are produced without requiring video export.
-
Add/adjust training tests so they assert training-time rollout validation forces
record_video=false. -
Add/adjust training tests so they assert trajectory image paths flow from eval summary into SwanLab media logging.
-
Add/adjust training tests so they assert image media is logged, not only scalar reward metrics.
Task 2: Implement per-episode front trajectory image export in eval
Files:
-
Modify:
roboimi/demos/vla_scripts/eval_vla.py -
Reuse/Read:
roboimi/utils/raw_action_trajectory_viewer.py -
Modify:
roboimi/vla/conf/eval/eval.yaml -
Add config plumbing for
save_trajectory_imageandtrajectory_image_camera_name. -
Ensure the default training-time camera resolution path is pinned to
front. -
Implement distinct per-episode image naming so 5 rollout episodes create 5 distinct PNGs.
-
Reuse the existing red trajectory representation logic when composing the PNG.
-
Ensure headless eval works under EGL even on machines with
DISPLAYset.
Task 3: Implement SwanLab rollout image logging in training
Files:
-
Modify:
roboimi/demos/vla_scripts/train_vla.py -
Modify:
tests/test_train_vla_swanlab_logging.py -
Modify:
tests/test_train_vla_rollout_validation.py -
Make
run_rollout_validation()forcerecord_video=false. -
Make
run_rollout_validation()forcesave_trajectory_image=trueandtrajectory_image_camera_name=front. -
Ensure rollout validation still uses 5 episodes per validation event for the requested run.
-
Add a best-effort helper that converts per-episode image paths into SwanLab image media payloads.
-
Keep image-upload failures non-fatal and warning-only.
Task 4: Verify action-chunk semantics for the new run
Files:
-
Verify:
roboimi/vla/agent.py -
Verify:
roboimi/vla/agent_imf.py -
Test:
tests/test_imf_vla_agent.py -
Confirm the existing queue logic still means “predict 8, execute first 4”.
-
Do not change branch defaults unless strictly necessary; prefer launch-time overrides.
Task 5: Verify and launch the requested local training run
Files:
-
Use:
roboimi/demos/vla_scripts/train_vla.py -
Use:
roboimi/demos/vla_scripts/eval_vla.py -
Run the targeted verification suite.
-
Run one real headless smoke eval and confirm a front trajectory PNG is produced while
video_mp4stays null. -
Launch the new local training run with explicit overrides including:
agent=resnet_imf_attnresagent.head.n_emb=384agent.head.n_layer=12agent.pred_horizon=8agent.num_action_steps=4train.max_steps=50000train.rollout_num_episodes=5train.use_swanlab=true- current local baseline dataset/camera/CUDA/batch/lr/num_workers/backbone settings
-
Verify PID, GPU allocation, log tail, and SwanLab run URL.