# PushT Image DiT iMF + SwanLab Design ## Goal Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using `test_mean_score` as the primary metric. ## Context - The implementation baseline is `main`. - The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged. - Environment management must use the repo-local `uv` workflow. - The trusted remote machine alias `5880` refers to `droid-system-product-name` (`droid@100.73.14.65`) and can run two GPU jobs in parallel. ## Architecture Overview The work is split into two verified phases: 1. **Logging migration phase** - Keep the existing PushT image DiT training behavior intact. - Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments. - Preserve local `logs.json.txt` output. - Ensure rollout metrics such as `test_mean_score` and per-seed rewards are still logged. - Disable simulation video logging at both the config and runner/logging boundary. 2. **iMF migration phase** - Keep the original diffusion-based transformer image policy available on `main`. - Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy. - Reuse the existing observation encoder and training workspace where possible. - Replace diffusion training with the iMeanFlow training objective. - Use one-step inference for validation/rollout in the iMF path. ## Logging Design ### Scope Only the PushT image DiT experiment chain is changed: - `train_diffusion_transformer_hybrid_workspace.py` - `pusht_image_runner.py` - the new/updated PushT image transformer configs ### Behavior - SwanLab runs in `online` mode. - Logged values are scalar metrics only, e.g.: - `train_loss` - `val_loss` - `train_action_mse_error` - `test_mean_score` - aggregate rollout metrics and optional per-seed scalar rewards - No simulation videos are uploaded or wrapped as logging objects. - Local JSON logging remains enabled for auditability and remote-job fallback debugging. ### Operational safeguards - Default PushT experiment configs set `task.env_runner.n_test_vis=0` and `task.env_runner.n_train_vis=0`. - The PushT image runner will not emit video objects into `log_data`, preventing accidental uploads even if visualization counts are later changed. - SwanLab credentials are provided through the environment at runtime, not committed into the repo. ## iMF Model Design ### Baseline reuse The iMF path reuses: - the existing image observation encoder - the existing action/observation normalization path - the existing training workspace skeleton - the existing PushT image dataset and env runner ### New files - `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py` - `diffusion_policy/policy/imf_transformer_hybrid_image_policy.py` - `image_pusht_diffusion_policy_dit_imf.yaml` ### Existing files changed for the iMF path - `diffusion_policy/workspace/train_diffusion_transformer_hybrid_workspace.py` - logging migration to SwanLab for this experiment chain - no structural training-loop fork beyond instantiating the configured policy and logging scalar metrics - `diffusion_policy/env_runner/pusht_image_runner.py` - suppress video objects in returned logs ### Model structure The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but it remains a **single-head model** that predicts only: - `u`: average velocity field The same function is reused at two evaluation points: - `fn(z_t, r, t, cond)` predicts average velocity `u` - `fn(z_t, t, t, cond)` predicts the instantaneous velocity surrogate `v` Inputs remain conditioned on encoded observations and action trajectory tokens. ## iMF Training Objective For a normalized action trajectory `x`, the initial implementation follows the user-provided Algorithm 1 exactly: 1. sample `t, r` 2. sample Gaussian noise `e` 3. form `z_t = (1 - t) * x + t * e` 4. predict instantaneous velocity surrogate with the same network: - `v = fn(z_t, t, t, cond)` 5. define the JVP function exactly as: - `g(z, r, t) = fn(z, r, t, cond)` 6. compute the primal output and JVP with tangent: - `u, du_dt = jvp(g, (z_t, r, t), (v.detach(), 0, 1))` 7. form compound velocity: - `V = u + (t - r) * stopgrad(du_dt)` 8. train against the average-velocity target: - `target = e - x` 9. optimize only the masked iMF loss: - `loss = metric(V - target)` There is **no auxiliary `v` loss** in the initial implementation. The implementation should prefer `torch.func.jvp` and keep a safe fallback path if the local Torch stack needs it. ## iMF Inference Design Inference uses a single step starting from noise: - initialize `z_1 ~ N(0, I)` - set `t = 1.0`, `r = 0.0` - predict `u(z_1, t, r, cond)` - produce the action sample with one update: - `x_hat = z_1 - (t - r) * u` This matches the time direction in the reference iMeanFlow sampling logic. ## Testing Strategy ### Phase 1: logging migration smoke test - use the repo-local `uv` environment - run a debug/smoke PushT image DiT training job on a single GPU with: - `training.debug=true` - `dataloader.num_workers=0` - `val_dataloader.num_workers=0` - `task.env_runner.n_envs=1` - `task.env_runner.n_test_vis=0` - `task.env_runner.n_train_vis=0` - verify: - SwanLab initializes successfully - `logs.json.txt` is populated - rollout metrics still include `test_mean_score` - no video logging is attempted ### Phase 2: iMF smoke test - run an equivalent debug PushT image iMF job - verify: - forward/backward passes succeed - JVP path executes on the local Torch version - one-step inference returns correctly shaped actions - rollout produces scalar metrics including `test_mean_score` ## Branch and Commit Strategy 1. start from a `main`-based worktree branch 2. commit the SwanLab/no-video migration after smoke verification 3. continue with the iMF implementation 4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea ## Experiment Plan After the iMF path is smoke-tested and pushed: - run a 3x3 grid over: - `n_emb ∈ {128, 256, 384}` - `n_layer ∈ {6, 12, 18}` - keep the rest of the setup fixed - run each experiment for 300 epochs - primary comparison metric: `test_mean_score` ## Resource Allocation Three concurrent runs should be scheduled continuously until the matrix is complete: - local machine: 1 GPU - `5880`: 2 GPUs Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts. ## Risks and Mitigations - **Torch JVP compatibility risk**: provide a fallback JVP implementation and smoke-test immediately. - **Logging regression risk**: keep local JSON logging and verify scalar rollout metrics before moving to iMF. - **Video/logging side effects**: disable visualizations in config and filter video objects out of runner logs. - **Cross-host drift**: push the verified branch to Gitea before launching the experiment matrix on multiple machines.