feat: add rollout trajectory image artifacts and swanlab logging

2026-04-03 09:39:16 +08:00
parent 48f0eb8dd0
commit 0586a6e6c7
8 changed files with 626 additions and 21 deletions
--- a/docs/superpowers/specs/2026-04-02-imf-rollout-trajectory-images-design.md
+++ b/docs/superpowers/specs/2026-04-02-imf-rollout-trajectory-images-design.md
@@ -0,0 +1,75 @@
+# IMF Rollout Trajectory Images + Short-Horizon Training Design
+
+## Background
+The current RoboIMI IMF training flow can perform rollout validation and log scalar reward metrics to SwanLab, but it does not yet emit the qualitative rollout artifacts now required for analysis. The user wants training-time rollout validation to save front-view trajectory images with the model-generated trajectory drawn in red, upload those images to SwanLab, and then start a new local short-horizon IMF training run.
+
+## Goals
+1. During training-time rollout validation, save one **front-camera** trajectory image per rollout episode.
+2. The image must show the rollout EE trajectory in red.
+3. Reuse the existing repository trajectory visualization logic as much as practical, especially the existing red capsule-marker trajectory representation.
+4. Save 5 rollout images locally for each validation event and upload the same 5 images to SwanLab.
+5. Do **not** record rollout videos for this training-time validation flow.
+6. Start a new local IMF-AttnRes training run with:
+   - `agent.head.n_emb=384`
+   - `agent.head.n_layer=12`
+   - `agent.pred_horizon=8`
+   - `agent.num_action_steps=4`
+   - `train.max_steps=50000`
+   - `train.rollout_num_episodes=5`
+   - `train.use_swanlab=true`
+
+## Non-Goals
+- No IMF architecture or loss-function change.
+- No dataset schema change.
+- No rollout video generation for the new training flow.
+- No interactive viewer requirement.
+
+## Existing Relevant Code
+- `roboimi/demos/vla_scripts/eval_vla.py`
+  - already supports rollout summaries, optional trajectory export, and optional video export.
+- `roboimi/utils/raw_action_trajectory_viewer.py`
+  - already contains the red trajectory capsule-marker construction logic.
+- `roboimi/demos/vla_scripts/train_vla.py`
+  - already performs periodic rollout validation and scalar SwanLab logging.
+- `roboimi/vla/agent.py`
+  - already implements “predict pred_horizon, execute first num_action_steps” queue semantics.
+
+## Design Decisions
+
+### 1. Artifact contract
+Each rollout episode will emit one distinct PNG file under the eval artifact directory. The file naming/path contract must be per-episode, not shared, so a 5-episode validation event yields 5 stable image paths without overwriting.
+
+### 2. Trajectory definition
+The red trajectory corresponds to the **actually executed model action sequence** over the rollout loop: the raw EE actions returned and consumed step-by-step by the policy loop. For the requested short-horizon run, this means the visualization reflects repeated execution of the first 4 actions from each predicted 8-action chunk, not every discarded future prediction from replanning.
+
+### 3. Camera choice
+The training-time image export path is explicitly pinned to the repo’s concrete `front` camera key. It must not silently use `camera_names[0]` if that is not `front`.
+
+### 4. Rendering path
+`eval_vla.py` will add a lightweight headless image-export path that:
+- renders the `front` camera frame,
+- overlays the trajectory using the existing red trajectory representation,
+- saves a static PNG per episode.
+
+The implementation may reuse the existing marker-construction logic directly and add a minimal helper for final image composition/export.
+
+### 5. Training-time behavior
+`train_vla.py` rollout validation must explicitly:
+- request/save trajectory images,
+- keep `record_video=false`,
+- return the 5 per-episode image paths in the rollout summary payload,
+- upload those 5 images to SwanLab,
+- keep image-upload failures non-fatal.
+
+## Expected User-Visible Outcome
+For each scheduled validation event in the new training run:
+- 5 rollout episodes execute,
+- 5 front-view PNG trajectory images are saved locally,
+- the same 5 images are uploaded to SwanLab,
+- scalar reward metrics continue to be logged,
+- no rollout videos are generated.
+
+## Risks and Mitigations
+- **Headless rendering conflicts from desktop env vars**: force headless eval onto EGL when `headless=true`.
+- **Image overwrite risk**: use explicit per-episode artifact paths.
+- **SwanLab media API mismatch**: isolate media logging in a small best-effort helper.