61 lines
2.9 KiB
Markdown
61 lines
2.9 KiB
Markdown
# PushT iMF Full-Attention Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Add a separate full-attention PushT image iMF config, commit/push it on a new branch, and launch the 9-run 350-epoch architecture sweep across 3 GPUs.
|
|
|
|
**Architecture:** Keep the existing causal iMF path untouched and add a standalone full-attention config that only flips `policy.causal_attn=false` while retaining one-step iMF inference and SwanLab-safe naming. Reuse the previous 9-run architecture matrix and balanced 3-queue scheduling across local 5090 plus 5880 GPU0/GPU1.
|
|
|
|
**Tech Stack:** Hydra, Diffusion Policy iMF image workspace, SwanLab, uv env, local shell + trusted remote 5880 over SSH.
|
|
|
|
---
|
|
|
|
### Task 1: Add full-attention iMF config with TDD
|
|
|
|
**Files:**
|
|
- Create: `image_pusht_diffusion_policy_dit_imf_fullattn.yaml`
|
|
- Modify: `tests/test_pusht_swanlab_config.py`
|
|
|
|
- [ ] Write a failing config regression test asserting the new config uses SwanLab-safe naming and `policy.causal_attn == False`.
|
|
- [ ] Run the targeted pytest command and verify it fails because the config does not exist yet.
|
|
- [ ] Add the minimal full-attention config by composing from the existing PushT image iMF config and overriding only `exp_name` and `policy.causal_attn=false`.
|
|
- [ ] Re-run the targeted pytest and verify it passes.
|
|
|
|
### Task 2: Verify the new config
|
|
|
|
**Files:**
|
|
- Read: `image_pusht_diffusion_policy_dit_imf_fullattn.yaml`
|
|
|
|
- [ ] Run `train.py --help` for the new config.
|
|
- [ ] Run a real `training.debug=true` smoke test locally to confirm the training path is valid.
|
|
|
|
### Task 3: Commit and push the new branch
|
|
|
|
**Files:**
|
|
- Commit only the new config/test/plan files needed for the full-attention experiment chain.
|
|
|
|
- [ ] Run verification commands again before commit.
|
|
- [ ] Commit with a focused message.
|
|
- [ ] Push `feat/pusht-imf-fullattn` to origin.
|
|
|
|
### Task 4: Launch the 9-run sweep
|
|
|
|
**Files:**
|
|
- Write queue scripts and logs under `data/run_logs/` locally and on 5880.
|
|
- Write outputs under `data/outputs/` locally and on 5880.
|
|
|
|
- [ ] Use the same matrix as the prior iMF sweep: `n_emb ∈ {128,256,384}`, `n_layer ∈ {6,12,18}`, `seed=42`.
|
|
- [ ] Set `training.num_epochs=350` for all 9 runs.
|
|
- [ ] Encode `fullattn` in every `exp_name`, `logging.name`, and run directory to avoid collisions.
|
|
- [ ] Balance the 9 runs across local 5090, 5880 GPU0, and 5880 GPU1 as three serial queues.
|
|
- [ ] Sync the new config to the remote smoke repo before launching remote queues.
|
|
|
|
### Task 5: Monitor and auto-summarize
|
|
|
|
**Files:**
|
|
- Read local and remote pid files, logs, outputs, checkpoints.
|
|
|
|
- [ ] Start an xhigh monitoring agent that polls all three queues.
|
|
- [ ] On completion, parse all 9 `logs.json.txt` files and rank by max `test_mean_score`.
|
|
- [ ] Report embedding/layer trends and the best configuration.
|