Files
roboimi/docs/superpowers/specs/2026-04-02-imf-rollout-trajectory-images-design.md

4.0 KiB
Raw Permalink Blame History

IMF Rollout Trajectory Images + Short-Horizon Training Design

Background

The current RoboIMI IMF training flow can perform rollout validation and log scalar reward metrics to SwanLab, but it does not yet emit the qualitative rollout artifacts now required for analysis. The user wants training-time rollout validation to save front-view trajectory images with the model-generated trajectory drawn in red, upload those images to SwanLab, and then start a new local short-horizon IMF training run.

Goals

  1. During training-time rollout validation, save one front-camera trajectory image per rollout episode.
  2. The image must show the rollout EE trajectory in red.
  3. Reuse the existing repository trajectory visualization logic as much as practical, especially the existing red capsule-marker trajectory representation.
  4. Save 5 rollout images locally for each validation event and upload the same 5 images to SwanLab.
  5. Do not record rollout videos for this training-time validation flow.
  6. Start a new local IMF-AttnRes training run with:
    • agent.head.n_emb=384
    • agent.head.n_layer=12
    • agent.pred_horizon=8
    • agent.num_action_steps=4
    • train.max_steps=50000
    • train.rollout_num_episodes=5
    • train.use_swanlab=true

Non-Goals

  • No IMF architecture or loss-function change.
  • No dataset schema change.
  • No rollout video generation for the new training flow.
  • No interactive viewer requirement.

Existing Relevant Code

  • roboimi/demos/vla_scripts/eval_vla.py
    • already supports rollout summaries, optional trajectory export, and optional video export.
  • roboimi/utils/raw_action_trajectory_viewer.py
    • already contains the red trajectory capsule-marker construction logic.
  • roboimi/demos/vla_scripts/train_vla.py
    • already performs periodic rollout validation and scalar SwanLab logging.
  • roboimi/vla/agent.py
    • already implements “predict pred_horizon, execute first num_action_steps” queue semantics.

Design Decisions

1. Artifact contract

Each rollout episode will emit one distinct PNG file under the eval artifact directory. The file naming/path contract must be per-episode, not shared, so a 5-episode validation event yields 5 stable image paths without overwriting.

2. Trajectory definition

The red trajectory corresponds to the actually executed model action sequence over the rollout loop: the raw EE actions returned and consumed step-by-step by the policy loop. For the requested short-horizon run, this means the visualization reflects repeated execution of the first 4 actions from each predicted 8-action chunk, not every discarded future prediction from replanning.

3. Camera choice

The training-time image export path is explicitly pinned to the repos concrete front camera key. It must not silently use camera_names[0] if that is not front.

4. Rendering path

eval_vla.py will add a lightweight headless image-export path that:

  • renders the front camera frame,
  • overlays the trajectory using the existing red trajectory representation,
  • saves a static PNG per episode.

The implementation may reuse the existing marker-construction logic directly and add a minimal helper for final image composition/export.

5. Training-time behavior

train_vla.py rollout validation must explicitly:

  • request/save trajectory images,
  • keep record_video=false,
  • return the 5 per-episode image paths in the rollout summary payload,
  • upload those 5 images to SwanLab,
  • keep image-upload failures non-fatal.

Expected User-Visible Outcome

For each scheduled validation event in the new training run:

  • 5 rollout episodes execute,
  • 5 front-view PNG trajectory images are saved locally,
  • the same 5 images are uploaded to SwanLab,
  • scalar reward metrics continue to be logged,
  • no rollout videos are generated.

Risks and Mitigations

  • Headless rendering conflicts from desktop env vars: force headless eval onto EGL when headless=true.
  • Image overwrite risk: use explicit per-episode artifact paths.
  • SwanLab media API mismatch: isolate media logging in a small best-effort helper.