From ba6ede9425ca7a1cd2e6d3e6951ba28668e0d74f Mon Sep 17 00:00:00 2001 From: Logic Date: Thu, 26 Mar 2026 16:56:22 +0800 Subject: [PATCH] docs: add pusht imf swanlab design --- .../2026-03-26-pusht-imf-swanlab-design.md | 147 ++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md diff --git a/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md b/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md new file mode 100644 index 0000000..5027b87 --- /dev/null +++ b/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md @@ -0,0 +1,147 @@ +# PushT Image DiT iMF + SwanLab Design + +## Goal +Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using `test_mean_score` as the primary metric. + +## Context +- The implementation baseline is `main`. +- The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged. +- Environment management must use the repo-local `uv` workflow. +- The trusted remote machine alias `5880` refers to `droid-system-product-name` (`droid@100.73.14.65`) and can run two GPU jobs in parallel. + +## Architecture Overview +The work is split into two verified phases: + +1. **Logging migration phase** + - Keep the existing PushT image DiT training behavior intact. + - Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments. + - Preserve local `logs.json.txt` output. + - Ensure rollout metrics such as `test_mean_score` and per-seed rewards are still logged. + - Disable simulation video logging at both the config and runner/logging boundary. + +2. **iMF migration phase** + - Keep the original diffusion-based transformer image policy available on `main`. + - Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy. + - Reuse the existing observation encoder and training workspace where possible. + - Replace diffusion training with the iMeanFlow training objective. + - Use one-step inference for validation/rollout in the iMF path. + +## Logging Design +### Scope +Only the PushT image DiT experiment chain is changed: +- `train_diffusion_transformer_hybrid_workspace.py` +- `pusht_image_runner.py` +- the new/updated PushT image transformer configs + +### Behavior +- SwanLab runs in `online` mode. +- Logged values are scalar metrics only, e.g.: + - `train_loss` + - `val_loss` + - `train_action_mse_error` + - `test_mean_score` + - aggregate rollout metrics and optional per-seed scalar rewards +- No simulation videos are uploaded or wrapped as logging objects. +- Local JSON logging remains enabled for auditability and remote-job fallback debugging. + +### Operational safeguards +- Default PushT experiment configs set `task.env_runner.n_test_vis=0` and `task.env_runner.n_train_vis=0`. +- The PushT image runner will not emit video objects into `log_data`, preventing accidental uploads even if visualization counts are later changed. +- SwanLab credentials are provided through the environment at runtime, not committed into the repo. + +## iMF Model Design +### Baseline reuse +The iMF path reuses: +- the existing image observation encoder +- the existing action/observation normalization path +- the existing training workspace skeleton +- the existing PushT image dataset and env runner + +### New files +- `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py` +- `diffusion_policy/policy/imf_transformer_hybrid_image_policy.py` +- `image_pusht_diffusion_policy_dit_imf.yaml` + +### Model structure +The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but predicts two heads: +- `u`: average velocity field +- `v`: instantaneous velocity field + +Inputs remain conditioned on encoded observations and action trajectory tokens. + +## iMF Training Objective +For a normalized action trajectory `x`: +1. sample `t, r` +2. sample Gaussian noise `e` +3. form `z_t = (1 - t) * x + t * e` +4. predict instantaneous velocity `v = fn(z_t, t, t)` or equivalently the model’s `v` head at time `t` +5. compute `u` and `du/dt` with JVP using tangent `(v, 0, 1)` over `(z, r, t)` +6. form compound velocity: + - `V = u + (t - r) * stopgrad(du_dt)` +7. train against target average velocity: + - `target = e - x` +8. optimize the iMF loss on unmasked action tokens, with any auxiliary `v`-head loss kept only if it helps preserve stability + +The implementation should prefer `torch.func.jvp` and keep a safe fallback path if the local Torch stack needs it. + +## iMF Inference Design +Inference uses a single step starting from noise: +- initialize `z_1 ~ N(0, I)` +- set `t = 1.0`, `r = 0.0` +- predict `u(z_1, t, r, cond)` +- produce the action sample with one update: + - `x_hat = z_1 - (t - r) * u` + +This matches the time direction in the reference iMeanFlow sampling logic. + +## Testing Strategy +### Phase 1: logging migration smoke test +- use the repo-local `uv` environment +- run a debug/smoke PushT image DiT training job on a single GPU with: + - `training.debug=true` + - `dataloader.num_workers=0` + - `val_dataloader.num_workers=0` + - `task.env_runner.n_envs=1` + - `task.env_runner.n_test_vis=0` + - `task.env_runner.n_train_vis=0` +- verify: + - SwanLab initializes successfully + - `logs.json.txt` is populated + - rollout metrics still include `test_mean_score` + - no video logging is attempted + +### Phase 2: iMF smoke test +- run an equivalent debug PushT image iMF job +- verify: + - forward/backward passes succeed + - JVP path executes on the local Torch version + - one-step inference returns correctly shaped actions + - rollout produces scalar metrics including `test_mean_score` + +## Branch and Commit Strategy +1. start from a `main`-based worktree branch +2. commit the SwanLab/no-video migration after smoke verification +3. continue with the iMF implementation +4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea + +## Experiment Plan +After the iMF path is smoke-tested: +- run a 3x3 grid over: + - `n_emb ∈ {128, 256, 384}` + - `n_layer ∈ {6, 12, 18}` +- keep the rest of the setup fixed +- run each experiment for 300 epochs +- primary comparison metric: `test_mean_score` + +## Resource Allocation +Three concurrent runs should be scheduled continuously until the matrix is complete: +- local machine: 1 GPU +- `5880`: 2 GPUs + +Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts. + +## Risks and Mitigations +- **Torch JVP compatibility risk**: provide a fallback JVP implementation and smoke-test immediately. +- **Logging regression risk**: keep local JSON logging and verify scalar rollout metrics before moving to iMF. +- **Video/logging side effects**: disable visualizations in config and filter video objects out of runner logs. +- **Cross-host drift**: push the verified branch to Gitea before launching the experiment matrix on multiple machines.