docs: refine pusht imf spec scope

2026-03-26 17:02:17 +08:00
parent 15a0c41cbf
commit 23374a4cd2
1 changed files with 11 additions and 5 deletions
--- a/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md
+++ b/docs/superpowers/specs/2026-03-26-pusht-imf-swanlab-design.md
@@ -26,6 +26,10 @@ The work is split into two verified phases:
   - Replace diffusion training with the iMeanFlow training objective.
   - Use one-step inference for validation/rollout in the iMF path.

+The implementation planning boundary for this spec is:
+- code changes through a smoke-tested, pushed iMF branch
+- not the full 3x3 sweep execution/monitoring workflow, which should be planned separately after the code path is verified and pushed
+
 ## Logging Design
 ### Scope
 Only the PushT image DiT experiment chain is changed:
@@ -74,6 +78,7 @@ The iMF transformer mirrors the current transformer policy structure closely eno
 - `u`: average velocity field

 The same function is reused at two evaluation points:
+- canonical signature: `fn(z, r, t, cond)`
 - `fn(z_t, r, t, cond)` predicts average velocity `u`
 - `fn(z_t, t, t, cond)` predicts the instantaneous velocity surrogate `v`

@@ -103,7 +108,7 @@ There is **no auxiliary `v` loss** in the initial implementation. The implementa
 Inference uses a single step starting from noise:
 - initialize `z_1 ~ N(0, I)`
 - set `t = 1.0`, `r = 0.0`
- predict `u(z_1, t, r, cond)`
+- predict `u = fn(z_1, r, t, cond)`
 - produce the action sample with one update:
  - `x_hat = z_1 - (t - r) * u`

@@ -139,17 +144,18 @@ This matches the time direction in the reference iMeanFlow sampling logic.
 3. continue with the iMF implementation
 4. once iMF smoke tests pass, create/preserve a dedicated feature branch for the experiment code and push it to Gitea

-## Experiment Plan
-After the iMF path is smoke-tested and pushed:
+## Post-Implementation Experiment Plan
+After the iMF path is smoke-tested and pushed, a separate experiment-execution plan should launch:
 - run a 3x3 grid over:
  - `n_emb ∈ {128, 256, 384}`
  - `n_layer ∈ {6, 12, 18}`
 - keep the rest of the setup fixed
+- use a fixed single-seed setting for comparability unless a later explicit experiment plan expands that scope
 - run each experiment for 300 epochs
 - primary comparison metric: `test_mean_score`

-## Resource Allocation
-Three concurrent runs should be scheduled continuously until the matrix is complete:
+## Post-Implementation Resource Allocation
+The separate experiment-execution plan should schedule three concurrent runs until the matrix is complete:
 - local machine: 1 GPU
 - `5880`: 2 GPUs