feat: add vision transfer backbones and IMF variants

This commit is contained in:
Logic
2026-04-09 14:02:24 +08:00
parent d51b3ecafa
commit ff7c9c1f2a
58 changed files with 2788 additions and 26 deletions

View File

@@ -0,0 +1,69 @@
# Camera Ablation Summary (`pred_horizon=16`, `num_action_steps=8`, ResNet IMF)
- Generated: 2026-04-05
- Common setup: original ResNet vision backbone, `n_emb=384`, `n_layer=12`, `batch_size=80`, `lr=2.5e-4`, `max_steps=50k`, rollout every 5 epochs with 5 episodes, headless eval.
- Metric for comparison: `checkpoints/vla_model_best.pt -> rollout_avg_reward`.
## Leaderboard
| Rank | Cameras | Best avg_reward | Best step | Final loss | Run name |
|---:|---|---:|---:|---:|---|
| 1 | `top + front` | **274.8** | 48124 | 0.0056 | `imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023` |
| 2 | `top` | **271.2** | 43749 | 0.0052 | `imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844` |
| 3 | `r_vis + front` | **244.0** | 21874 | 0.0043 | `imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029` |
| 4 | `r_vis` | **6.4** | 17499 | 0.0047 | `imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844` |
| 5 | `r_vis + top` | **1.2** | 4374 | 0.0047 | `imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844` |
| 6 | `front` | **0.0** | 4374 | 0.0074 | `imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607` |
## Main takeaways
1. **`top` 是最关键的单相机视角**`top only = 271.2`,几乎与 `top + front = 274.8` 持平。
2. **`front` 单独几乎没有效用**`front only = 0.0`
3. **`r_vis` 单独也基本无效**`r_vis only = 6.4`
4. **`r_vis + front` 可以显著优于单独 `front` / `r_vis`**,说明这两个视角有一定互补性,但仍明显弱于任何包含 `top` 且表现正常的配置。
5. **`r_vis + top` 的结果异常差**:只有 `1.2`,远低于 `top only = 271.2`。这说明简单加入 `r_vis` 并不保证增益,甚至可能破坏当前设置下的学习。
6. **训练 loss 与 rollout reward 明显不一致**:例如 `r_vis + top``r_vis only` 的 final loss 都不高,但 reward 很差,因此本组实验必须以 rollout reward 而不是 loss 选型。
## Horizontal comparison views
### Single-camera comparison
- `top`: **271.2**
- `r_vis`: **6.4**
- `front`: **0.0**
结论:**`top >>> r_vis > front`**。
### Two-camera comparison
- `top + front`: **274.8**
- `r_vis + front`: **244.0**
- `r_vis + top`: **1.2**
结论:
- **最稳妥的双相机组合是 `top + front`**。
- `r_vis + front` 有效,但不如 `top + front`
- `r_vis + top` 在当前设置下几乎失效。
### Incremental effect of adding a second view
-`top` 基础上加 `front``271.2 -> 274.8`**增益很小**。
-`front` 基础上加 `r_vis``0.0 -> 244.0`**增益很大**。
-`top` 基础上加 `r_vis``271.2 -> 1.2`**显著退化**。
## Practical recommendation
如果只从这 6 个实验里选:
- **首选**`top + front`
- **次选**`top only`
- 如果必须不用 `top``r_vis + front` 明显优于 `front only` / `r_vis only`
- **不建议**`r_vis + top`
## Note relative to previous 3-camera baseline
此前 3 相机 `[r_vis, top, front]` 的最佳 reward 为 **610.8**
因此这次 6 个 camera ablation 的最佳结果(`top + front = 274.8`)说明:
- 当前这个训练批次里,**去掉任意一个视角都会显著低于之前的 3 相机最优结果**
- 但在去掉视角的约束下,**`top` 仍然是最核心的保留对象**。

View File

@@ -0,0 +1,8 @@
# CHECKLIST
- [x] Confirm remote free GPU
- [x] Create front-only run contract
- [x] Remote smoke test passes
- [x] Launch 50k run on remote GPU0
- [x] Record pid / log / SwanLab
- [x] Report status back to user

View File

@@ -0,0 +1,28 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone, using only the `front` camera as image conditioning.
## Fixed comparison contract
- Same as the active `top/front` run except image input is reduced to `[front]`
- Agent: `resnet_imf_attnres`
- Vision backbone mode: `resnet`
- `pred_horizon=16`, `num_action_steps=8`
- `n_emb=384`, `n_layer=12`, `n_head=1`, `n_kv_head=1`
- `inference_steps=1`
- `batch_size=80`, `lr=2.5e-4`, cosine, warmup=2000
- dataset: `/home/droid/sim_dataset/sim_transfer`
- cameras: `[front]` only
- rollout every 5 epochs with 5 episodes, headless
## Resource plan
- Host: `100.119.99.14`
- GPU: `0`
## Important dimension override
- Single-camera visual cond dim = `64 + 16 = 80`, so override `agent.head.cond_dim=80` and `agent.num_cams=1`.
## Execution path
1. 2-step smoke test on remote GPU0.
2. If smoke passes, launch 50k main run with SwanLab.
3. Record pid / run_dir / log / URL locally.

View File

@@ -0,0 +1,6 @@
# Notes
- 2026-04-05 09:55:27: remote 2-step smoke passed on `100.119.99.14` GPU0 with `front` only, batch=80, no OOM.
- 2026-04-05 09:56:26: launched main run `imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607`.
- 2026-04-05 09:57:36: confirmed training is stable through step 200, latest loss 0.2830.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/7kdii8oc6tjkcyu5y0lwq

View File

@@ -0,0 +1,51 @@
{
"suite_name": "2026-04-05-front-only-resnet-1cam",
"updated_at": "2026-04-05 09:57:36",
"phase": "running",
"baseline_reference": {
"source_run": "imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023",
"notes": "Same hyperparameters as the active top/front run, but image input is reduced to [front] only."
},
"smoke_test": {
"status": "passed",
"host": "100.119.99.14",
"gpu": 0,
"run_dir": "/home/droid/roboimi_suite_20260404/runs/smoke-frontonly-resnet-ph16-ex08-20260405-095509",
"batch_size": 80,
"max_steps": 2,
"note": "2-step remote CUDA smoke passed on L20 GPU0 without OOM."
},
"main_run": {
"status": "running",
"host": "100.119.99.14",
"gpu": 0,
"launch_pid": 158874,
"pid": 158877,
"run_name": "imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607",
"run_dir": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607",
"log_path": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607/train_vla.log",
"launch_log": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607.launch.log",
"dataset_dir": "/home/droid/sim_dataset/sim_transfer",
"camera_names": [
"front"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_cond_dim": 80,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/7kdii8oc6tjkcyu5y0lwq",
"latest_step": 200,
"latest_loss": 0.283,
"process_running": true
}
}

View File

@@ -0,0 +1,8 @@
# CHECKLIST
- [x] Confirm camera mapping (`right` -> `r_vis`)
- [x] Create front+r_vis run contract
- [x] Remote smoke test passes
- [x] Launch 50k run on remote GPU1
- [x] Record pid / log / SwanLab
- [x] Report status back to user

View File

@@ -0,0 +1,23 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone, using `front` + `r_vis` cameras only.
## Fixed comparison contract
- Same hyperparameters as the active top/front and front-only runs
- Agent: `resnet_imf_attnres`
- Vision backbone mode: `resnet`
- `pred_horizon=16`, `num_action_steps=8`
- `n_emb=384`, `n_layer=12`, `n_head=1`, `n_kv_head=1`
- `inference_steps=1`
- `batch_size=80`, `lr=2.5e-4`, cosine warmup 2000
- dataset: `/home/droid/sim_dataset/sim_transfer`
- cameras: `[r_vis, front]`
- rollout every 5 epochs with 5 episodes, headless
## Important dimension override
- Two-camera visual cond dim = `64*2 + 16 = 144`, so set `agent.num_cams=2`, `agent.head.cond_dim=144`.
## Resource plan
- Host: `100.119.99.14`
- GPU: `1`

View File

@@ -0,0 +1,6 @@
# Notes
- 2026-04-05 10:20:09: remote 2-step smoke passed on `100.119.99.14` GPU1 with `r_vis + front`, batch=80, no OOM.
- 2026-04-05 10:20:49: launched main run `imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029`.
- 2026-04-05 10:22:03: confirmed training is stable through step 200, latest loss 0.3321.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/3fyzjfdcbiq7frtbqv6ss

View File

@@ -0,0 +1,55 @@
{
"suite_name": "2026-04-05-front-rvis-resnet-2cam",
"updated_at": "2026-04-05 10:22:03",
"phase": "running",
"interpretation": {
"right_camera_name": "r_vis"
},
"baseline_reference": {
"source_run": "imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023",
"notes": "Same hyperparameters as the active top/front run, replacing top with r_vis."
},
"smoke_test": {
"status": "passed",
"host": "100.119.99.14",
"gpu": 1,
"run_dir": "/home/droid/roboimi_suite_20260404/runs/smoke-frontrvis-resnet-ph16-ex08-20260405-102001",
"batch_size": 80,
"max_steps": 2,
"note": "2-step remote CUDA smoke passed on L20 GPU1 without OOM."
},
"main_run": {
"status": "running",
"host": "100.119.99.14",
"gpu": 1,
"launch_pid": 159910,
"pid": 159913,
"run_name": "imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029",
"run_dir": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029",
"log_path": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029/train_vla.log",
"launch_log": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029.launch.log",
"dataset_dir": "/home/droid/sim_dataset/sim_transfer",
"camera_names": [
"r_vis",
"front"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_cond_dim": 144,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/3fyzjfdcbiq7frtbqv6ss",
"latest_step": 200,
"latest_loss": 0.3321,
"process_running": true
}
}

View File

@@ -0,0 +1,73 @@
{
"date": "2026-04-06",
"branch": "feat-imf-attnres-policy",
"worktree": "/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy",
"model": "LEWM ViT frozen visual encoder + IMF AttnRes diffusion head",
"checkpoint_path": "/home/droid/le-wm/lewm-sim-transfer/pa1w85md8jop6bvol8oxp/checkpoints/epoch=99-step=47800.ckpt",
"visual_contract": {
"input_camera_names": ["r_vis", "top", "front"],
"fused_camera_names": ["front", "top", "r_vis"],
"joint_output_dim": 192,
"freeze_backbone": true,
"dataset_image_resize_shape": null,
"eval_image_resize_shape": [256, 256],
"fused_short_side_resize": 224
},
"training_contract": {
"pred_horizon": 16,
"num_action_steps": 8,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 10,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"scheduler_type": "cosine",
"warmup_steps": 2000,
"min_lr": 1e-06,
"weight_decay": 1e-05,
"grad_clip": 1.0
},
"verification": {
"local_tests": "38 passed",
"remote_dataset_shape": [2, 3, 256, 256],
"remote_eval_prepared_shape": [3, 256, 256],
"remote_smoke_run": {
"run_name": "smoke-lewm-imf-rawpath-emb384-20260406-002002",
"result": "passed",
"details": "2-step train + checkpoint-triggered 1-episode headless rollout succeeded with corrected raw256 path"
}
},
"superseded_runs": [
{
"run_name": "lewm-vit-imf-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260405-201914",
"reason": "stopped due to incorrect early per-camera 224 resize"
},
{
"run_name": "lewm-vit-imf-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260405-201914",
"reason": "stopped due to incorrect early per-camera 224 resize"
}
],
"full_runs": [
{
"host": "100.73.14.65",
"gpu": 0,
"run_name": "lewm-vit-imf-raw256fix-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260406-002124",
"pid": 1058589,
"log_path": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/lewm-vit-imf-raw256fix-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260406-002124.launch.log",
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/y5tzgqe0u966w9ak41i31",
"head_n_emb": 384,
"head_n_layer": 12
},
{
"host": "100.73.14.65",
"gpu": 1,
"run_name": "lewm-vit-imf-raw256fix-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260406-002124",
"pid": 1058590,
"log_path": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/lewm-vit-imf-raw256fix-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260406-002124.launch.log",
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/2esr9y7t2dgesstgrn5i6",
"head_n_emb": 256,
"head_n_layer": 12
}
]
}

View File

@@ -0,0 +1,25 @@
# 2026-04-06 LEWM ViT Transfer Notes
## Root-cause fix
The first LEWM runs were stopped because the data path still resized each camera view to `224x224` **before** multiview fusion. That preserved the final tensor shape but broke the original LEWM geometry.
Corrected path now is:
- **Training dataset**: keep stored per-view `256x256` images (`data.image_resize_shape=null` at launch; dataset instantiate override is `None` for LEWM)
- **Eval rollout input**: resize live MuJoCo `480x640` camera images to `256x256` per view
- **Backbone**: fuse `front, top, r_vis` on the LEWM axis, then resize fused short side to `224`
## Verification
- Local tests passed (`38 passed` across the focused suite)
- Remote check:
- dataset sample image shape: `(2, 3, 256, 256)`
- eval-prepared live frame shape: `(3, 256, 256)`
- Remote smoke passed with real checkpoint:
- `smoke-lewm-imf-rawpath-emb384-20260406-002002`
## Current runs
- `lewm-vit-imf-raw256fix-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260406-002124`
- `lewm-vit-imf-raw256fix-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260406-002124`

View File

@@ -0,0 +1,19 @@
{
"status": "running",
"updated_at": "2026-04-06T00:22:10+08:00",
"remote_host": "100.73.14.65",
"runs": [
{
"run_name": "lewm-vit-imf-raw256fix-sim-transfer-emb384-l12-ph16-ex08-step50k-roll10-5880g0-20260406-002124",
"pid": 1058589,
"gpu": 0,
"state": "running"
},
{
"run_name": "lewm-vit-imf-raw256fix-sim-transfer-emb256-l12-ph16-ex08-step50k-roll10-5880g1-20260406-002124",
"pid": 1058590,
"gpu": 1,
"state": "running"
}
]
}

View File

@@ -0,0 +1,7 @@
# CHECKLIST
- [x] Create run contract
- [x] Remote smoke test passes
- [x] Launch 50k main run
- [x] Record pid / log / SwanLab
- [x] Report status back to user

View File

@@ -0,0 +1,12 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone using r_vis only as the only image conditioning.
## Fixed comparison contract
- same hyperparameters as the active top/front run
- cameras: ['r_vis']
- num_cams=1
- head.cond_dim=80
- host: 100.119.99.14
- gpu: 3

View File

@@ -0,0 +1,6 @@
# Notes
- 2026-04-05 12:58:22: smoke passed for ['r_vis'] on 100.119.99.14 GPU3.
- 2026-04-05 12:59:24: launched main run `imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844`.
- 2026-04-05 13:01:20: latest confirmed progress step=400, loss=0.1165.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/qnuh7vln9mqomxxldyecq

View File

@@ -0,0 +1,47 @@
{
"suite_name": "2026-04-05-rvis-only-resnet-1cam",
"updated_at": "2026-04-05 13:01:20",
"phase": "running",
"smoke_test": {
"status": "passed",
"host": "100.119.99.14",
"gpu": 3,
"run_dir": "/home/droid/roboimi_suite_20260404/runs/smoke-rvisonly-resnet-ph16-ex08-20260405-125812",
"batch_size": 80,
"max_steps": 2,
"note": "2-step remote CUDA smoke passed without OOM."
},
"main_run": {
"status": "running",
"host": "100.119.99.14",
"gpu": 3,
"launch_pid": 164812,
"pid": 164816,
"run_name": "imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844",
"run_dir": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844",
"log_path": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844/train_vla.log",
"launch_log": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844.launch.log",
"dataset_dir": "/home/droid/sim_dataset/sim_transfer",
"camera_names": [
"r_vis"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_cond_dim": 80,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/qnuh7vln9mqomxxldyecq",
"latest_step": 400,
"latest_loss": 0.1165,
"process_running": true
}
}

View File

@@ -0,0 +1,7 @@
# CHECKLIST
- [x] Create run contract
- [x] Remote smoke test passes
- [x] Launch 50k main run
- [x] Record pid / log / SwanLab
- [x] Report status back to user

View File

@@ -0,0 +1,12 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone using r_vis + top as the only image conditioning.
## Fixed comparison contract
- same hyperparameters as the active top/front run
- cameras: ['r_vis', 'top']
- num_cams=2
- head.cond_dim=144
- host: 100.119.99.14
- gpu: 2

View File

@@ -0,0 +1,6 @@
# Notes
- 2026-04-05 12:58:22: smoke passed for ['r_vis', 'top'] on 100.119.99.14 GPU2.
- 2026-04-05 12:59:24: launched main run `imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844`.
- 2026-04-05 13:01:20: latest confirmed progress step=200, loss=0.2845.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/umsm6402eb81et7wx7z4a

View File

@@ -0,0 +1,48 @@
{
"suite_name": "2026-04-05-rvistop-resnet-2cam",
"updated_at": "2026-04-05 13:01:20",
"phase": "running",
"smoke_test": {
"status": "passed",
"host": "100.119.99.14",
"gpu": 2,
"run_dir": "/home/droid/roboimi_suite_20260404/runs/smoke-rvistop-resnet-ph16-ex08-20260405-125812",
"batch_size": 80,
"max_steps": 2,
"note": "2-step remote CUDA smoke passed without OOM."
},
"main_run": {
"status": "running",
"host": "100.119.99.14",
"gpu": 2,
"launch_pid": 164745,
"pid": 164749,
"run_name": "imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844",
"run_dir": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844",
"log_path": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844/train_vla.log",
"launch_log": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844.launch.log",
"dataset_dir": "/home/droid/sim_dataset/sim_transfer",
"camera_names": [
"r_vis",
"top"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_cond_dim": 144,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/umsm6402eb81et7wx7z4a",
"latest_step": 200,
"latest_loss": 0.2845,
"process_running": true
}
}

View File

@@ -0,0 +1,8 @@
# CHECKLIST
- [x] Confirm baseline hyperparameters from trusted prior run
- [x] Confirm local GPU availability
- [x] Smoke test with `top/front` cameras only
- [x] Launch 50k run
- [x] Record pid / run dir / log path / SwanLab URL
- [x] Report status back to user

View File

@@ -0,0 +1,30 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone (no full-AttnRes vision replacement), using only `top` and `front` cameras as image conditioning.
## Fixed comparison contract
- Agent: `resnet_imf_attnres`
- Vision backbone mode: `resnet`
- `pred_horizon=16`
- `num_action_steps=8`
- `n_emb=384`, `n_layer=12`, `n_head=1`, `n_kv_head=1`
- `inference_steps=1`
- `batch_size=80`, `lr=2.5e-4`, cosine scheduler, warmup 2000
- dataset: `/home/droid/project/diana_sim/sim_transfer`
- cameras: `[top, front]` only
- training budget: `max_steps=50000`
- rollout validation: every 5 epochs, 5 episodes, headless
## Resource plan
- Host: local
- GPU: RTX 5090 (GPU 0)
## Execution path
1. Run a short 2-step smoke test on GPU with the exact 2-camera config.
2. If smoke passes, launch the 50k main run with durable log redirection.
3. Record run name, pid, log path, and SwanLab URL into suite status.
## Fallbacks
- If batch 80 OOMs, fall back to batch 64 with scaled lr 2.0e-4.
- If dataloader startup is unstable, reduce num_workers from 12 to 8.

View File

@@ -0,0 +1,5 @@
# Notes
- 2026-04-05 08:50:04: 2-step smoke test passed locally on RTX 5090 with `top/front` cameras, batch=80, no OOM.
- 2026-04-05 08:50:42: launched main run `imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023` on local GPU0.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/vi77mn5dwd19z4nttxab8

View File

@@ -0,0 +1,51 @@
{
"suite_name": "2026-04-05-top-front-resnet-2cam",
"updated_at": "2026-04-05 08:52:12",
"phase": "running",
"baseline_reference": {
"source_run": "imf-p1-ph16-ex08-emb384-l12-ms50k-5880g1-20260404-131223",
"best_rollout_avg_reward": 610.8,
"best_step": 21874,
"notes": "Same IMF baseline as Phase-1 best, but switch cameras from [r_vis, top, front] to [top, front] and keep the original ResNet vision backbone."
},
"smoke_test": {
"status": "passed",
"run_dir": "/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/smoke-topfront-resnet-ph16-ex08-20260405-085000",
"batch_size": 80,
"num_workers": 4,
"max_steps": 2,
"note": "2-step local CUDA smoke passed without OOM using top/front only."
},
"main_run": {
"status": "running",
"host": "local",
"gpu": 0,
"pid": 1693348,
"run_name": "imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023",
"run_dir": "/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023",
"log_path": "/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/runs/imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023/train_vla.log",
"launch_log": "/home/droid/project/roboimi/.worktrees/feat-imf-attnres-policy/experiment_suites/2026-04-05-top-front-resnet-2cam/launch_logs/imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023.launch.log",
"dataset_dir": "/home/droid/project/diana_sim/sim_transfer",
"camera_names": [
"top",
"front"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/vi77mn5dwd19z4nttxab8",
"latest_step": 500,
"latest_loss": 0.0978,
"process_running": true
}
}

View File

@@ -0,0 +1,7 @@
# CHECKLIST
- [x] Create run contract
- [x] Remote smoke test passes
- [x] Launch 50k main run
- [x] Record pid / log / SwanLab
- [x] Report status back to user

View File

@@ -0,0 +1,12 @@
# PLAN
## Goal
Train a 50k-step IMF baseline with the original ResNet vision backbone using top only as the only image conditioning.
## Fixed comparison contract
- same hyperparameters as the active top/front run
- cameras: ['top']
- num_cams=1
- head.cond_dim=80
- host: 100.119.99.14
- gpu: 4

View File

@@ -0,0 +1,6 @@
# Notes
- 2026-04-05 12:58:22: smoke passed for ['top'] on 100.119.99.14 GPU4.
- 2026-04-05 12:59:24: launched main run `imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844`.
- 2026-04-05 13:01:20: latest confirmed progress step=400, loss=0.1233.
- SwanLab: https://swanlab.cn/@game-loader/roboimi-vla/runs/egzo29l3z9ftsaunhf025

View File

@@ -0,0 +1,47 @@
{
"suite_name": "2026-04-05-top-only-resnet-1cam",
"updated_at": "2026-04-05 13:01:20",
"phase": "running",
"smoke_test": {
"status": "passed",
"host": "100.119.99.14",
"gpu": 4,
"run_dir": "/home/droid/roboimi_suite_20260404/runs/smoke-toponly-resnet-ph16-ex08-20260405-125812",
"batch_size": 80,
"max_steps": 2,
"note": "2-step remote CUDA smoke passed without OOM."
},
"main_run": {
"status": "running",
"host": "100.119.99.14",
"gpu": 4,
"launch_pid": 164808,
"pid": 164813,
"run_name": "imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844",
"run_dir": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844",
"log_path": "/home/droid/roboimi_suite_20260404/runs/imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844/train_vla.log",
"launch_log": "/home/droid/roboimi_suite_20260404/experiment_suite_launch_logs/imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844.launch.log",
"dataset_dir": "/home/droid/sim_dataset/sim_transfer",
"camera_names": [
"top"
],
"pred_horizon": 16,
"num_action_steps": 8,
"head_cond_dim": 80,
"head_n_emb": 384,
"head_n_layer": 12,
"vision_backbone_mode": "resnet",
"pretrained_backbone_weights": null,
"freeze_backbone": false,
"batch_size": 80,
"lr": 0.00025,
"num_workers": 12,
"max_steps": 50000,
"rollout_val_freq_epochs": 5,
"rollout_num_episodes": 5,
"swanlab_url": "https://swanlab.cn/@game-loader/roboimi-vla/runs/egzo29l3z9ftsaunhf025",
"latest_step": 400,
"latest_loss": 0.1233,
"process_running": true
}
}