Files
roboimi/docs/superpowers/plans/2026-03-31-rollout-artifacts.md

2.6 KiB

Rollout Artifacts Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Extend rollout evaluation so one selected checkpoint can be run once with video capture, timing breakdown, and saved EE trajectory artifacts.

Architecture: Keep the implementation centered in eval_vla.py so existing training-time rollout validation remains compatible. Add config-gated artifact capture helpers, serialize outputs under the eval run directory, and add lightweight tests for helper behavior and summary wiring; default eval behavior must remain unchanged when artifact capture is off.

Tech Stack: Python, Hydra/OmegaConf, NumPy, OpenCV, JSON, PyTorch unittest/mocking.


Task 1: Add artifact capture configuration and helper wiring

Files:

  • Modify: roboimi/demos/vla_scripts/eval_vla.py

  • Modify: roboimi/vla/conf/eval/eval.yaml

  • Test: tests/test_eval_vla_rollout_artifacts.py

  • Step 1: Write failing tests for optional artifact config / summary wiring

  • Step 2: Implement config-backed artifact flags and output paths with defaults that write nothing

  • Step 3: Verify existing eval call sites still work with defaults

Task 2: Add timing breakdown, video recording, and trajectory export

Files:

  • Modify: roboimi/demos/vla_scripts/eval_vla.py

  • Test: tests/test_eval_vla_rollout_artifacts.py

  • Step 1: Write failing tests for timing aggregation, trajectory serialization, and summary schema

  • Step 2: Implement per-step timing capture for obs_read_ms, preprocess_ms, inference_ms, env_step_ms, loop_total_ms

  • Step 3: Implement MP4 recording from a chosen camera stream and canonical trajectory.npz export using left_link7/right_link7 executed poses after env.step

  • Step 4: Run focused tests and fix issues

Task 3: Stop training safely and execute one real rollout

Files:

  • Use: roboimi/demos/vla_scripts/eval_vla.py

  • Output: runs/.../eval_artifacts/...

  • Step 1: Stop the active training process, wait for exit, and confirm the target checkpoint is readable

  • Step 2: Select the latest completed checkpoint if an explicit one is not provided; fall back to prior completed / best checkpoint if needed

  • Step 3: Run one headless rollout with artifact capture enabled

  • Step 4: Verify the MP4 / timing summary / trajectory files exist and summarize findings