roboimi/docs/superpowers/plans/2026-03-31-rollout-artifacts.md

# Rollout Artifacts Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Extend rollout evaluation so one selected checkpoint can be run once with video capture, timing breakdown, and saved EE trajectory artifacts.

**Architecture:** Keep the implementation centered in `eval_vla.py` so existing training-time rollout validation remains compatible. Add config-gated artifact capture helpers, serialize outputs under the eval run directory, and add lightweight tests for helper behavior and summary wiring; default eval behavior must remain unchanged when artifact capture is off.

**Tech Stack:** Python, Hydra/OmegaConf, NumPy, OpenCV, JSON, PyTorch unittest/mocking.

---

### Task 1: Add artifact capture configuration and helper wiring

**Files:**
- Modify: `roboimi/demos/vla_scripts/eval_vla.py`
- Modify: `roboimi/vla/conf/eval/eval.yaml`
- Test: `tests/test_eval_vla_rollout_artifacts.py`

- [ ] **Step 1: Write failing tests for optional artifact config / summary wiring**
- [ ] **Step 2: Implement config-backed artifact flags and output paths with defaults that write nothing**
- [ ] **Step 3: Verify existing eval call sites still work with defaults**

### Task 2: Add timing breakdown, video recording, and trajectory export

**Files:**
- Modify: `roboimi/demos/vla_scripts/eval_vla.py`
- Test: `tests/test_eval_vla_rollout_artifacts.py`

- [ ] **Step 1: Write failing tests for timing aggregation, trajectory serialization, and summary schema**
- [ ] **Step 2: Implement per-step timing capture for `obs_read_ms`, `preprocess_ms`, `inference_ms`, `env_step_ms`, `loop_total_ms`**
- [ ] **Step 3: Implement MP4 recording from a chosen camera stream and canonical `trajectory.npz` export using `left_link7/right_link7` executed poses after `env.step`**
- [ ] **Step 4: Run focused tests and fix issues**

### Task 3: Stop training safely and execute one real rollout

**Files:**
- Use: `roboimi/demos/vla_scripts/eval_vla.py`
- Output: `runs/.../eval_artifacts/...`

- [ ] **Step 1: Stop the active training process, wait for exit, and confirm the target checkpoint is readable**
- [ ] **Step 2: Select the latest completed checkpoint if an explicit one is not provided; fall back to prior completed / best checkpoint if needed**
- [ ] **Step 3: Run one headless rollout with artifact capture enabled**
- [ ] **Step 4: Verify the MP4 / timing summary / trajectory files exist and summarize findings**