163 lines
7.2 KiB
Markdown
163 lines
7.2 KiB
Markdown
# PushT Image DiT iMF + SwanLab Design
|
|
|
|
## Goal
|
|
Migrate the PushT image DiT experiment path from W&B to SwanLab online logging, suppress simulation video logging, then add an iMeanFlow-based one-step transformer policy for PushT image experiments and run a controlled architecture sweep over embedding width and depth using `test_mean_score` as the primary metric.
|
|
|
|
## Context
|
|
- The implementation baseline is `main`.
|
|
- The experiment path is limited to the PushT image transformer workflow; unrelated workspaces and runners should remain unchanged.
|
|
- Environment management must use the repo-local `uv` workflow.
|
|
- The trusted remote machine alias `5880` refers to `droid-system-product-name` (`droid@100.73.14.65`) and can run two GPU jobs in parallel.
|
|
|
|
## Architecture Overview
|
|
The work is split into two verified phases:
|
|
|
|
1. **Logging migration phase**
|
|
- Keep the existing PushT image DiT training behavior intact.
|
|
- Replace W&B usage with SwanLab in the transformer hybrid workspace used by PushT image DiT experiments.
|
|
- Preserve local `logs.json.txt` output.
|
|
- Ensure rollout metrics such as `test_mean_score` and per-seed rewards are still logged.
|
|
- Disable simulation video logging at both the config and runner/logging boundary.
|
|
|
|
2. **iMF migration phase**
|
|
- Keep the original diffusion-based transformer image policy available on `main`.
|
|
- Add a parallel iMF-specific model/policy/config path rather than overwriting the baseline diffusion policy.
|
|
- Reuse the existing observation encoder and training workspace where possible.
|
|
- Replace diffusion training with the iMeanFlow training objective.
|
|
- Use one-step inference for validation/rollout in the iMF path.
|
|
|
|
## Logging Design
|
|
### Scope
|
|
Only the PushT image DiT experiment chain is changed:
|
|
- `train_diffusion_transformer_hybrid_workspace.py`
|
|
- `pusht_image_runner.py`
|
|
- the new/updated PushT image transformer configs
|
|
|
|
### Behavior
|
|
- SwanLab runs in `online` mode.
|
|
- Logged values are scalar metrics only, e.g.:
|
|
- `train_loss`
|
|
- `val_loss`
|
|
- `train_action_mse_error`
|
|
- `test_mean_score`
|
|
- aggregate rollout metrics and optional per-seed scalar rewards
|
|
- No simulation videos are uploaded or wrapped as logging objects.
|
|
- Local JSON logging remains enabled for auditability and remote-job fallback debugging.
|
|
|
|
### Operational safeguards
|
|
- Default PushT experiment configs set `task.env_runner.n_test_vis=0` and `task.env_runner.n_train_vis=0`.
|
|
- The PushT image runner will not emit video objects into `log_data`, preventing accidental uploads even if visualization counts are later changed.
|
|
- SwanLab credentials are provided through the environment at runtime, not committed into the repo.
|
|
|
|
## iMF Model Design
|
|
### Baseline reuse
|
|
The iMF path reuses:
|
|
- the existing image observation encoder
|
|
- the existing action/observation normalization path
|
|
- the existing training workspace skeleton
|
|
- the existing PushT image dataset and env runner
|
|
|
|
### New files
|
|
- `diffusion_policy/model/diffusion/imf_transformer_for_diffusion.py`
|
|
- `diffusion_policy/policy/imf_transformer_hybrid_image_policy.py`
|
|
- `image_pusht_diffusion_policy_dit_imf.yaml`
|
|
|
|
### Existing files changed for the iMF path
|
|
- `diffusion_policy/workspace/train_diffusion_transformer_hybrid_workspace.py`
|
|
- logging migration to SwanLab for this experiment chain
|
|
- no structural training-loop fork beyond instantiating the configured policy and logging scalar metrics
|
|
- `diffusion_policy/env_runner/pusht_image_runner.py`
|
|
- suppress video objects in returned logs
|
|
|
|
### Model structure
|
|
The iMF transformer mirrors the current transformer policy structure closely enough to reuse known-good conditioning patterns, but it remains a **single-head model** that predicts only:
|
|
- `u`: average velocity field
|
|
|
|
The same function is reused at two evaluation points:
|
|
- `fn(z_t, r, t, cond)` predicts average velocity `u`
|
|
- `fn(z_t, t, t, cond)` predicts the instantaneous velocity surrogate `v`
|
|
|
|
Inputs remain conditioned on encoded observations and action trajectory tokens.
|
|
|
|
## iMF Training Objective
|
|
For a normalized action trajectory `x`, the initial implementation follows the user-provided Algorithm 1 exactly:
|
|
1. sample `t, r`
|
|
2. sample Gaussian noise `e`
|
|
3. form `z_t = (1 - t) * x + t * e`
|
|
4. predict instantaneous velocity surrogate with the same network:
|
|
- `v = fn(z_t, t, t, cond)`
|
|
5. define the JVP function exactly as:
|
|
- `g(z, r, t) = fn(z, r, t, cond)`
|
|
6. compute the primal output and JVP with tangent:
|
|
- `u, du_dt = jvp(g, (z_t, r, t), (v.detach(), 0, 1))`
|
|
7. form compound velocity:
|
|
- `V = u + (t - r) * stopgrad(du_dt)`
|
|
8. train against the average-velocity target:
|
|
- `target = e - x`
|
|
9. optimize only the masked iMF loss:
|
|
- `loss = metric(V - target)`
|
|
|
|
There is **no auxiliary `v` loss** in the initial implementation. The implementation should prefer `torch.func.jvp` and keep a safe fallback path if the local Torch stack needs it.
|
|
|
|
## iMF Inference Design
|
|
Inference uses a single step starting from noise:
|
|
- initialize `z_1 ~ N(0, I)`
|
|
- set `t = 1.0`, `r = 0.0`
|
|
- predict `u(z_1, t, r, cond)`
|
|
- produce the action sample with one update:
|
|
- `x_hat = z_1 - (t - r) * u`
|
|
|
|
This matches the time direction in the reference iMeanFlow sampling logic.
|
|
|
|
## Testing Strategy
|
|
### Phase 1: logging migration smoke test
|
|
- use the repo-local `uv` environment
|
|
- run a debug/smoke PushT image DiT training job on a single GPU with:
|
|
- `training.debug=true`
|
|
- `dataloader.num_workers=0`
|
|
- `val_dataloader.num_workers=0`
|
|
- `task.env_runner.n_envs=1`
|
|
- `task.env_runner.n_test_vis=0`
|
|
- `task.env_runner.n_train_vis=0`
|
|
- verify:
|
|
- SwanLab initializes successfully
|
|
- `logs.json.txt` is populated
|
|
- rollout metrics still include `test_mean_score`
|
|
- no video logging is attempted
|
|
|
|
### Phase 2: iMF smoke test
|
|
- run an equivalent debug PushT image iMF job
|
|
- verify:
|
|
- forward/backward passes succeed
|
|
- JVP path executes on the local Torch version
|
|
- one-step inference returns correctly shaped actions
|
|
- rollout produces scalar metrics including `test_mean_score`
|
|
|
|
## Branch and Commit Strategy
|
|
1. start from a `main`-based worktree branch
|
|
2. commit the SwanLab/no-video migration after smoke verification
|
|
3. continue with the iMF implementation
|
|
4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea
|
|
|
|
## Experiment Plan
|
|
After the iMF path is smoke-tested and pushed:
|
|
- run a 3x3 grid over:
|
|
- `n_emb ∈ {128, 256, 384}`
|
|
- `n_layer ∈ {6, 12, 18}`
|
|
- keep the rest of the setup fixed
|
|
- run each experiment for 300 epochs
|
|
- primary comparison metric: `test_mean_score`
|
|
|
|
## Resource Allocation
|
|
Three concurrent runs should be scheduled continuously until the matrix is complete:
|
|
- local machine: 1 GPU
|
|
- `5880`: 2 GPUs
|
|
|
|
Each run uses the same uv-managed environment and the same pushed branch so the code path is consistent across hosts.
|
|
|
|
## Risks and Mitigations
|
|
- **Torch JVP compatibility risk**: provide a fallback JVP implementation and smoke-test immediately.
|
|
- **Logging regression risk**: keep local JSON logging and verify scalar rollout metrics before moving to iMF.
|
|
- **Video/logging side effects**: disable visualizations in config and filter video objects out of runner logs.
|
|
- **Cross-host drift**: push the verified branch to Gitea before launching the experiment matrix on multiple machines.
|