diff --git a/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md b/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md index 47a7ef0..00ed960 100644 --- a/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md +++ b/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md @@ -26,6 +26,10 @@ The work is split into two verified phases: - Replace diffusion training with the iMeanFlow training objective. - Use one-step inference for validation/rollout in the iMF path. +The implementation planning boundary for this spec is: +- code changes through a smoke-tested, pushed iMF branch +- not the full 3x3 sweep execution/monitoring workflow, which should be planned separately after the code path is verified and pushed + ## Logging Design ### Scope Only the PushT image DiT experiment chain is changed: @@ -74,6 +78,7 @@ The iMF transformer mirrors the current transformer policy structure closely eno - `u`: average velocity field The same function is reused at two evaluation points: +- canonical signature: `fn(z, r, t, cond)` - `fn(z_t, r, t, cond)` predicts average velocity `u` - `fn(z_t, t, t, cond)` predicts the instantaneous velocity surrogate `v` @@ -103,7 +108,7 @@ There is **no auxiliary `v` loss** in the initial implementation. The implementa Inference uses a single step starting from noise: - initialize `z_1 ~ N(0, I)` - set `t = 1.0`, `r = 0.0` -- predict `u(z_1, t, r, cond)` +- predict `u = fn(z_1, r, t, cond)` - produce the action sample with one update: - `x_hat = z_1 - (t - r) * u` @@ -139,17 +144,18 @@ This matches the time direction in the reference iMeanFlow sampling logic. 3. continue with the iMF implementation 4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea -## Experiment Plan -After the iMF path is smoke-tested and pushed: +## Post-Implementation Experiment Plan +After the iMF path is smoke-tested and pushed, a separate experiment-execution plan should launch: - run a 3x3 grid over: - `n_emb ∈ {128, 256, 384}` - `n_layer ∈ {6, 12, 18}` - keep the rest of the setup fixed +- use a fixed single-seed setting for comparability unless a later explicit experiment plan expands that scope - run each experiment for 300 epochs - primary comparison metric: `test_mean_score` -## Resource Allocation -Three concurrent runs should be scheduled continuously until the matrix is complete: +## Post-Implementation Resource Allocation +The separate experiment-execution plan should schedule three concurrent runs until the matrix is complete: - local machine: 1 GPU - `5880`: 2 GPUs