feat: add vision transfer backbones and IMF variants

2026-04-09 14:02:24 +08:00
parent d51b3ecafa
commit ff7c9c1f2a
58 changed files with 2788 additions and 26 deletions
@@ -0,0 +1,25 @@
+# 2026-04-06 LEWM ViT Transfer Notes
+
+## Root-cause fix
+
+The first LEWM runs were stopped because the data path still resized each camera view to `224x224` **before** multiview fusion. That preserved the final tensor shape but broke the original LEWM geometry.
+
+Corrected path now is:
+
+- **Training dataset**: keep stored per-view `256x256` images (`data.image_resize_shape=null` at launch; dataset instantiate override is `None` for LEWM)
+- **Eval rollout input**: resize live MuJoCo `480x640` camera images to `256x256` per view
+- **Backbone**: fuse `front, top, r_vis` on the LEWM axis, then resize fused short side to `224`
+
+## Verification
+
+- Local tests passed (`38 passed` across the focused suite)
+- Remote check:
+  - dataset sample image shape: `(2, 3, 256, 256)`
+  - eval-prepared live frame shape: `(3, 256, 256)`
+- Remote smoke passed with real checkpoint:
+  - `smoke-lewm-imf-rawpath-emb384-20260406-002002`
+
+## Current runs
+
+- `lewm-vit-imf-raw256fix-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260406-002124`
+- `lewm-vit-imf-raw256fix-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260406-002124`