4.8 KiB
LEWM ViT Backbone Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the current ResNet visual encoder in roboimi VLA training with a frozen LEWM ViT visual backbone (encoder + projector) that consumes the three camera views jointly and outputs one 192-d CLS embedding per timestep, then launch two 50k runs on the 5880 machine.
Architecture: Add a new joint-multiview LEWM backbone that fuses front/top/r_vis into one LEWM-style image, reproduces LEWM preprocessing, loads frozen weights from the trained checkpoint, and exposes a joint_output_dim=192. Add a minimal VLAAgent compatibility branch so conditions can be sized from joint visual dim instead of output_dim * num_cams, while leaving the rest of the diffusion pipeline unchanged.
Tech Stack: PyTorch, transformers ViTModel, Hydra configs, existing roboimi VLA training/eval scripts, remote SSH/rsync to 100.73.14.65.
Task 1: Add failing tests for LEWM joint-vision backbone contract
Files:
-
Create:
tests/test_lewm_vit_backbone.py -
Modify:
tests/test_imf_vla_agent.py -
Step 1: Write the failing backbone shape/load test
-
Step 2: Run
pytest tests/test_lewm_vit_backbone.py -qand verify it fails -
Step 3: Extend
tests/test_imf_vla_agent.pywith a failing joint-output backbone case -
Step 4: Run
pytest tests/test_imf_vla_agent.py -qand verify it fails
Task 2: Implement LEWM joint-multiview frozen backbone
Files:
-
Create:
roboimi/vla/models/backbones/lewm_vit_backbone.py -
Modify:
roboimi/vla/models/backbones/__init__.pyonly if exports are needed -
Step 1: Create
LEWMViTBackbonewith public attrscamera_names,num_cameras,joint_output_dim=192 -
Step 2: Reproduce LEWM preprocessing and joint multiview fusion
-
Step 3: Load checkpoint weights from
model.encoder.*andmodel.projector.* -
Step 4: Freeze encoder/projector and keep them in eval mode via
train()override -
Step 5: Run
pytest tests/test_lewm_vit_backbone.py -qand verify green
Task 3: Add minimal agent support for joint visual dim
Files:
-
Modify:
roboimi/vla/agent.py -
Test:
tests/test_imf_vla_agent.py -
Step 1: Add a
joint_output_dimbranch inVLAAgent.__init__forper_step_cond_dim/global_cond_dim -
Step 2: Keep
_build_cond()semantics unchanged except for matching the new dim contract -
Step 3: Run
pytest tests/test_imf_vla_agent.py -qand verify green
Task 4: Add Hydra configs for LEWM backbone training
Files:
-
Create:
roboimi/vla/conf/backbone/lewm_vit_diffusion.yaml -
Create:
roboimi/vla/conf/agent/lewm_imf_attnres.yaml -
Step 1: Add backbone config pointing to the new LEWM backbone
-
Step 2: Add
agent=lewm_imf_attnresconfig with 3 cameras andhead.cond_dim=208 -
Step 3: Verify Hydra instantiation with a one-shot compose smoke
Task 5: Verify focused local tests
Files:
-
Reuse the above
-
Step 1: Run
pytest tests/test_lewm_vit_backbone.py tests/test_imf_vla_agent.py tests/test_eval_vla_headless_import.py -q -
Step 2: If needed, run one tiny local import/forward smoke
Task 6: Sync to 5880 and remote smoke with real checkpoint
Files:
-
Remote target:
/home/droid/roboimi_suite_20260404 -
Step 1: Rsync modified source/config files to
100.73.14.65:/home/droid/roboimi_suite_20260404 -
Step 2: Run a 2-step smoke on GPU0 with
agent.head.n_emb=384,train.rollout_num_episodes=10, real LEWM checkpoint -
Step 3: Run a 2-step smoke on GPU1 with
agent.head.n_emb=256, same checkpoint
Task 7: Launch two real 50k runs on the 5880 machine
Files:
-
Remote logs under
/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/ -
Step 1: Launch embed384/layer12 on GPU0
-
Step 2: Launch embed256/layer12 on GPU1
-
Step 3: Ensure both use
data.camera_names=[r_vis,top,front],pred_horizon=16,num_action_steps=8,train.rollout_num_episodes=10,max_steps=50000 -
Step 4: Record run names, pids, log paths, SwanLab URLs
Task 8: Update experiment tracking docs and commit
Files:
-
Create:
experiment_suites/2026-04-05-lewm-vit-transfer/manifest.json -
Create:
experiment_suites/2026-04-05-lewm-vit-transfer/status.json -
Create:
experiment_suites/2026-04-05-lewm-vit-transfer/notes.md -
Step 1: Record checkpoint path, frozen LEWM design, rollout=10, and both run configs
-
Step 2: Record running status after launch
-
Step 3: Commit implementation + docs with a focused message