2.6 KiB
Rollout Artifacts Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Extend rollout evaluation so one selected checkpoint can be run once with video capture, timing breakdown, and saved EE trajectory artifacts.
Architecture: Keep the implementation centered in eval_vla.py so existing training-time rollout validation remains compatible. Add config-gated artifact capture helpers, serialize outputs under the eval run directory, and add lightweight tests for helper behavior and summary wiring; default eval behavior must remain unchanged when artifact capture is off.
Tech Stack: Python, Hydra/OmegaConf, NumPy, OpenCV, JSON, PyTorch unittest/mocking.
Task 1: Add artifact capture configuration and helper wiring
Files:
-
Modify:
roboimi/demos/vla_scripts/eval_vla.py -
Modify:
roboimi/vla/conf/eval/eval.yaml -
Test:
tests/test_eval_vla_rollout_artifacts.py -
Step 1: Write failing tests for optional artifact config / summary wiring
-
Step 2: Implement config-backed artifact flags and output paths with defaults that write nothing
-
Step 3: Verify existing eval call sites still work with defaults
Task 2: Add timing breakdown, video recording, and trajectory export
Files:
-
Modify:
roboimi/demos/vla_scripts/eval_vla.py -
Test:
tests/test_eval_vla_rollout_artifacts.py -
Step 1: Write failing tests for timing aggregation, trajectory serialization, and summary schema
-
Step 2: Implement per-step timing capture for
obs_read_ms,preprocess_ms,inference_ms,env_step_ms,loop_total_ms -
Step 3: Implement MP4 recording from a chosen camera stream and canonical
trajectory.npzexport usingleft_link7/right_link7executed poses afterenv.step -
Step 4: Run focused tests and fix issues
Task 3: Stop training safely and execute one real rollout
Files:
-
Use:
roboimi/demos/vla_scripts/eval_vla.py -
Output:
runs/.../eval_artifacts/... -
Step 1: Stop the active training process, wait for exit, and confirm the target checkpoint is readable
-
Step 2: Select the latest completed checkpoint if an explicit one is not provided; fall back to prior completed / best checkpoint if needed
-
Step 3: Run one headless rollout with artifact capture enabled
-
Step 4: Verify the MP4 / timing summary / trajectory files exist and summarize findings