Files
diffusion_policy/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md
2026-03-26 16:56:22 +08:00

6.4 KiB
Raw Blame History

PushT Image DiT iMF + SwanLab Design

Goal

Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using test_mean_score as the primary metric.

Context

  • The implementation baseline is main.
  • The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged.
  • Environment management must use the repo-local uv workflow.
  • The trusted remote machine alias 5880 refers to droid-system-product-name (droid@100.73.14.65) and can run two GPU jobs in parallel.

Architecture Overview

The work is split into two verified phases:

  1. Logging migration phase

    • Keep the existing PushT image DiT training behavior intact.
    • Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments.
    • Preserve local logs.json.txt output.
    • Ensure rollout metrics such as test_mean_score and per-seed rewards are still logged.
    • Disable simulation video logging at both the config and runner/logging boundary.
  2. iMF migration phase

    • Keep the original diffusion-based transformer image policy available on main.
    • Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy.
    • Reuse the existing observation encoder and training workspace where possible.
    • Replace diffusion training with the iMeanFlow training objective.
    • Use one-step inference for validation/rollout in the iMF path.

Logging Design

Scope

Only the PushT image DiT experiment chain is changed:

  • train_diffusion_transformer_hybrid_workspace.py
  • pusht_image_runner.py
  • the new/updated PushT image transformer configs

Behavior

  • SwanLab runs in online mode.
  • Logged values are scalar metrics only, e.g.:
    • train_loss
    • val_loss
    • train_action_mse_error
    • test_mean_score
    • aggregate rollout metrics and optional per-seed scalar rewards
  • No simulation videos are uploaded or wrapped as logging objects.
  • Local JSON logging remains enabled for auditability and remote-job fallback debugging.

Operational safeguards

  • Default PushT experiment configs set task.env_runner.n_test_vis=0 and task.env_runner.n_train_vis=0.
  • The PushT image runner will not emit video objects into log_data, preventing accidental uploads even if visualization counts are later changed.
  • SwanLab credentials are provided through the environment at runtime, not committed into the repo.

iMF Model Design

Baseline reuse

The iMF path reuses:

  • the existing image observation encoder
  • the existing action/observation normalization path
  • the existing training workspace skeleton
  • the existing PushT image dataset and env runner

New files

  • diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py
  • diffusion_policy/policy/imf_transformer_hybrid_image_policy.py
  • image_pusht_diffusion_policy_dit_imf.yaml

Model structure

The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but predicts two heads:

  • u: average velocity field
  • v: instantaneous velocity field

Inputs remain conditioned on encoded observations and action trajectory tokens.

iMF Training Objective

For a normalized action trajectory x:

  1. sample t, r
  2. sample Gaussian noise e
  3. form z_t = (1 - t) * x + t * e
  4. predict instantaneous velocity v = fn(z_t, t, t) or equivalently the models v head at time t
  5. compute u and du/dt with JVP using tangent (v, 0, 1) over (z, r, t)
  6. form compound velocity:
    • V = u + (t - r) * stopgrad(du_dt)
  7. train against target average velocity:
    • target = e - x
  8. optimize the iMF loss on unmasked action tokens, with any auxiliary v-head loss kept only if it helps preserve stability

The implementation should prefer torch.func.jvp and keep a safe fallback path if the local Torch stack needs it.

iMF Inference Design

Inference uses a single step starting from noise:

  • initialize z_1 ~ N(0, I)
  • set t = 1.0, r = 0.0
  • predict u(z_1, t, r, cond)
  • produce the action sample with one update:
    • x_hat = z_1 - (t - r) * u

This matches the time direction in the reference iMeanFlow sampling logic.

Testing Strategy

Phase 1: logging migration smoke test

  • use the repo-local uv environment
  • run a debug/smoke PushT image DiT training job on a single GPU with:
    • training.debug=true
    • dataloader.num_workers=0
    • val_dataloader.num_workers=0
    • task.env_runner.n_envs=1
    • task.env_runner.n_test_vis=0
    • task.env_runner.n_train_vis=0
  • verify:
    • SwanLab initializes successfully
    • logs.json.txt is populated
    • rollout metrics still include test_mean_score
    • no video logging is attempted

Phase 2: iMF smoke test

  • run an equivalent debug PushT image iMF job
  • verify:
    • forward/backward passes succeed
    • JVP path executes on the local Torch version
    • one-step inference returns correctly shaped actions
    • rollout produces scalar metrics including test_mean_score

Branch and Commit Strategy

  1. start from a main-based worktree branch
  2. commit the SwanLab/no-video migration after smoke verification
  3. continue with the iMF implementation
  4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea

Experiment Plan

After the iMF path is smoke-tested:

  • run a 3x3 grid over:
    • n_emb ∈ {128, 256, 384}
    • n_layer ∈ {6, 12, 18}
  • keep the rest of the setup fixed
  • run each experiment for 300 epochs
  • primary comparison metric: test_mean_score

Resource Allocation

Three concurrent runs should be scheduled continuously until the matrix is complete:

  • local machine: 1 GPU
  • 5880: 2 GPUs

Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts.

Risks and Mitigations

  • Torch JVP compatibility risk: provide a fallback JVP implementation and smoke-test immediately.
  • Logging regression risk: keep local JSON logging and verify scalar rollout metrics before moving to iMF.
  • Video/logging side effects: disable visualizations in config and filter video objects out of runner logs.
  • Cross-host drift: push the verified branch to Gitea before launching the experiment matrix on multiple machines.