diffusion_policy/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md

# PushT Image DiT iMF + SwanLab Design

## Goal
Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using `test_mean_score` as the primary metric.

## Context
- The implementation baseline is `main`.
- The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged.
- Environment management must use the repo-local `uv` workflow.
- The trusted remote machine alias `5880` refers to `droid-system-product-name` (`droid@100.73.14.65`) and can run two GPU jobs in parallel.

## Architecture Overview
The work is split into two verified phases:

1. **Logging migration phase**
   - Keep the existing PushT image DiT training behavior intact.
   - Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments.
   - Preserve local `logs.json.txt` output.
   - Ensure rollout metrics such as `test_mean_score` and per-seed rewards are still logged.
   - Disable simulation video logging at both the config and runner/logging boundary.

2. **iMF migration phase**
   - Keep the original diffusion-based transformer image policy available on `main`.
   - Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy.
   - Reuse the existing observation encoder and training workspace where possible.
   - Replace diffusion training with the iMeanFlow training objective.
   - Use one-step inference for validation/rollout in the iMF path.

## Logging Design
### Scope
Only the PushT image DiT experiment chain is changed:
- `train_diffusion_transformer_hybrid_workspace.py`
- `pusht_image_runner.py`
- the new/updated PushT image transformer configs

### Behavior
- SwanLab runs in `online` mode.
- Logged values are scalar metrics only, e.g.:
  - `train_loss`
  - `val_loss`
  - `train_action_mse_error`
  - `test_mean_score`
  - aggregate rollout metrics and optional per-seed scalar rewards
- No simulation videos are uploaded or wrapped as logging objects.
- Local JSON logging remains enabled for auditability and remote-job fallback debugging.

### Operational safeguards
- Default PushT experiment configs set `task.env_runner.n_test_vis=0` and `task.env_runner.n_train_vis=0`.
- The PushT image runner will not emit video objects into `log_data`, preventing accidental uploads even if visualization counts are later changed.
- SwanLab credentials are provided through the environment at runtime, not committed into the repo.

## iMF Model Design
### Baseline reuse
The iMF path reuses:
- the existing image observation encoder
- the existing action/observation normalization path
- the existing training workspace skeleton
- the existing PushT image dataset and env runner

### New files
- `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
- `diffusion_policy/policy/imf_transformer_hybrid_image_policy.py`
- `image_pusht_diffusion_policy_dit_imf.yaml`

### Existing files changed for the iMF path
- `diffusion_policy/workspace/train_diffusion_transformer_hybrid_workspace.py`
  - logging migration to SwanLab for this experiment chain
  - no structural training-loop fork beyond instantiating the configured policy and logging scalar metrics
- `diffusion_policy/env_runner/pusht_image_runner.py`
  - suppress video objects in returned logs

### Model structure
The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but it remains a **single-head model** that predicts only:
- `u`: average velocity field

The same function is reused at two evaluation points:
- `fn(z_t, r, t, cond)` predicts average velocity `u`
- `fn(z_t, t, t, cond)` predicts the instantaneous velocity surrogate `v`

Inputs remain conditioned on encoded observations and action trajectory tokens.

## iMF Training Objective
For a normalized action trajectory `x`, the initial implementation follows the user-provided Algorithm 1 exactly:
1. sample `t, r`
2. sample Gaussian noise `e`
3. form `z_t = (1 - t) * x + t * e`
4. predict instantaneous velocity surrogate with the same network:
   - `v = fn(z_t, t, t, cond)`
5. define the JVP function exactly as:
   - `g(z, r, t) = fn(z, r, t, cond)`
6. compute the primal output and JVP with tangent:
   - `u, du_dt = jvp(g, (z_t, r, t), (v.detach(), 0, 1))`
7. form compound velocity:
   - `V = u + (t - r) * stopgrad(du_dt)`
8. train against the average-velocity target:
   - `target = e - x`
9. optimize only the masked iMF loss:
   - `loss = metric(V - target)`

There is **no auxiliary `v` loss** in the initial implementation. The implementation should prefer `torch.func.jvp` and keep a safe fallback path if the local Torch stack needs it.

## iMF Inference Design
Inference uses a single step starting from noise:
- initialize `z_1 ~ N(0, I)`
- set `t = 1.0`, `r = 0.0`
- predict `u(z_1, t, r, cond)`
- produce the action sample with one update:
  - `x_hat = z_1 - (t - r) * u`

This matches the time direction in the reference iMeanFlow sampling logic.

## Testing Strategy
### Phase 1: logging migration smoke test
- use the repo-local `uv` environment
- run a debug/smoke PushT image DiT training job on a single GPU with:
  - `training.debug=true`
  - `dataloader.num_workers=0`
  - `val_dataloader.num_workers=0`
  - `task.env_runner.n_envs=1`
  - `task.env_runner.n_test_vis=0`
  - `task.env_runner.n_train_vis=0`
- verify:
  - SwanLab initializes successfully
  - `logs.json.txt` is populated
  - rollout metrics still include `test_mean_score`
  - no video logging is attempted

### Phase 2: iMF smoke test
- run an equivalent debug PushT image iMF job
- verify:
  - forward/backward passes succeed
  - JVP path executes on the local Torch version
  - one-step inference returns correctly shaped actions
  - rollout produces scalar metrics including `test_mean_score`

## Branch and Commit Strategy
1. start from a `main`-based worktree branch
2. commit the SwanLab/no-video migration after smoke verification
3. continue with the iMF implementation
4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea

## Experiment Plan
After the iMF path is smoke-tested and pushed:
- run a 3x3 grid over:
  - `n_emb ∈ {128, 256, 384}`
  - `n_layer ∈ {6, 12, 18}`
- keep the rest of the setup fixed
- run each experiment for 300 epochs
- primary comparison metric: `test_mean_score`

## Resource Allocation
Three concurrent runs should be scheduled continuously until the matrix is complete:
- local machine: 1 GPU
- `5880`: 2 GPUs

Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts.

## Risks and Mitigations
- **Torch JVP compatibility risk**: provide a fallback JVP implementation and smoke-test immediately.
- **Logging regression risk**: keep local JSON logging and verify scalar rollout metrics before moving to iMF.
- **Video/logging side effects**: disable visualizations in config and filter video objects out of runner logs.
- **Cross-host drift**: push the verified branch to Gitea before launching the experiment matrix on multiple machines.