Files
diffusion_policy/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md
2026-03-26 17:00:04 +08:00

7.2 KiB

PushT Image DiT iMF + SwanLab Design

Goal

Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using test_mean_score as the primary metric.

Context

  • The implementation baseline is main.
  • The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged.
  • Environment management must use the repo-local uv workflow.
  • The trusted remote machine alias 5880 refers to droid-system-product-name (droid@100.73.14.65) and can run two GPU jobs in parallel.

Architecture Overview

The work is split into two verified phases:

  1. Logging migration phase

    • Keep the existing PushT image DiT training behavior intact.
    • Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments.
    • Preserve local logs.json.txt output.
    • Ensure rollout metrics such as test_mean_score and per-seed rewards are still logged.
    • Disable simulation video logging at both the config and runner/logging boundary.
  2. iMF migration phase

    • Keep the original diffusion-based transformer image policy available on main.
    • Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy.
    • Reuse the existing observation encoder and training workspace where possible.
    • Replace diffusion training with the iMeanFlow training objective.
    • Use one-step inference for validation/rollout in the iMF path.

Logging Design

Scope

Only the PushT image DiT experiment chain is changed:

  • train_diffusion_transformer_hybrid_workspace.py
  • pusht_image_runner.py
  • the new/updated PushT image transformer configs

Behavior

  • SwanLab runs in online mode.
  • Logged values are scalar metrics only, e.g.:
    • train_loss
    • val_loss
    • train_action_mse_error
    • test_mean_score
    • aggregate rollout metrics and optional per-seed scalar rewards
  • No simulation videos are uploaded or wrapped as logging objects.
  • Local JSON logging remains enabled for auditability and remote-job fallback debugging.

Operational safeguards

  • Default PushT experiment configs set task.env_runner.n_test_vis=0 and task.env_runner.n_train_vis=0.
  • The PushT image runner will not emit video objects into log_data, preventing accidental uploads even if visualization counts are later changed.
  • SwanLab credentials are provided through the environment at runtime, not committed into the repo.

iMF Model Design

Baseline reuse

The iMF path reuses:

  • the existing image observation encoder
  • the existing action/observation normalization path
  • the existing training workspace skeleton
  • the existing PushT image dataset and env runner

New files

  • diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py
  • diffusion_policy/policy/imf_transformer_hybrid_image_policy.py
  • image_pusht_diffusion_policy_dit_imf.yaml

Existing files changed for the iMF path

  • diffusion_policy/workspace/train_diffusion_transformer_hybrid_workspace.py
    • logging migration to SwanLab for this experiment chain
    • no structural training-loop fork beyond instantiating the configured policy and logging scalar metrics
  • diffusion_policy/env_runner/pusht_image_runner.py
    • suppress video objects in returned logs

Model structure

The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but it remains a single-head model that predicts only:

  • u: average velocity field

The same function is reused at two evaluation points:

  • fn(z_t, r, t, cond) predicts average velocity u
  • fn(z_t, t, t, cond) predicts the instantaneous velocity surrogate v

Inputs remain conditioned on encoded observations and action trajectory tokens.

iMF Training Objective

For a normalized action trajectory x, the initial implementation follows the user-provided Algorithm 1 exactly:

  1. sample t, r
  2. sample Gaussian noise e
  3. form z_t = (1 - t) * x + t * e
  4. predict instantaneous velocity surrogate with the same network:
    • v = fn(z_t, t, t, cond)
  5. define the JVP function exactly as:
    • g(z, r, t) = fn(z, r, t, cond)
  6. compute the primal output and JVP with tangent:
    • u, du_dt = jvp(g, (z_t, r, t), (v.detach(), 0, 1))
  7. form compound velocity:
    • V = u + (t - r) * stopgrad(du_dt)
  8. train against the average-velocity target:
    • target = e - x
  9. optimize only the masked iMF loss:
    • loss = metric(V - target)

There is no auxiliary v loss in the initial implementation. The implementation should prefer torch.func.jvp and keep a safe fallback path if the local Torch stack needs it.

iMF Inference Design

Inference uses a single step starting from noise:

  • initialize z_1 ~ N(0, I)
  • set t = 1.0, r = 0.0
  • predict u(z_1, t, r, cond)
  • produce the action sample with one update:
    • x_hat = z_1 - (t - r) * u

This matches the time direction in the reference iMeanFlow sampling logic.

Testing Strategy

Phase 1: logging migration smoke test

  • use the repo-local uv environment
  • run a debug/smoke PushT image DiT training job on a single GPU with:
    • training.debug=true
    • dataloader.num_workers=0
    • val_dataloader.num_workers=0
    • task.env_runner.n_envs=1
    • task.env_runner.n_test_vis=0
    • task.env_runner.n_train_vis=0
  • verify:
    • SwanLab initializes successfully
    • logs.json.txt is populated
    • rollout metrics still include test_mean_score
    • no video logging is attempted

Phase 2: iMF smoke test

  • run an equivalent debug PushT image iMF job
  • verify:
    • forward/backward passes succeed
    • JVP path executes on the local Torch version
    • one-step inference returns correctly shaped actions
    • rollout produces scalar metrics including test_mean_score

Branch and Commit Strategy

  1. start from a main-based worktree branch
  2. commit the SwanLab/no-video migration after smoke verification
  3. continue with the iMF implementation
  4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea

Experiment Plan

After the iMF path is smoke-tested and pushed:

  • run a 3x3 grid over:
    • n_emb ∈ {128, 256, 384}
    • n_layer ∈ {6, 12, 18}
  • keep the rest of the setup fixed
  • run each experiment for 300 epochs
  • primary comparison metric: test_mean_score

Resource Allocation

Three concurrent runs should be scheduled continuously until the matrix is complete:

  • local machine: 1 GPU
  • 5880: 2 GPUs

Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts.

Risks and Mitigations

  • Torch JVP compatibility risk: provide a fallback JVP implementation and smoke-test immediately.
  • Logging regression risk: keep local JSON logging and verify scalar rollout metrics before moving to iMF.
  • Video/logging side effects: disable visualizations in config and filter video objects out of runner logs.
  • Cross-host drift: push the verified branch to Gitea before launching the experiment matrix on multiple machines.