feat: add vision transfer backbones and IMF variants
This commit is contained in:
69
experiment_suites/2026-04-05-camera-ablation-summary.md
Normal file
69
experiment_suites/2026-04-05-camera-ablation-summary.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Camera Ablation Summary (`pred_horizon=16`, `num_action_steps=8`, ResNet IMF)
|
||||
|
||||
- Generated: 2026-04-05
|
||||
- Common setup: original ResNet vision backbone, `n_emb=384`, `n_layer=12`, `batch_size=80`, `lr=2.5e-4`, `max_steps=50k`, rollout every 5 epochs with 5 episodes, headless eval.
|
||||
- Metric for comparison: `checkpoints/vla_model_best.pt -> rollout_avg_reward`.
|
||||
|
||||
## Leaderboard
|
||||
|
||||
| Rank | Cameras | Best avg_reward | Best step | Final loss | Run name |
|
||||
|---:|---|---:|---:|---:|---|
|
||||
| 1 | `top + front` | **274.8** | 48124 | 0.0056 | `imf-resnet-topfront-2cam-ph16-ex08-emb384-l12-ms50k-5090-20260405-085023` |
|
||||
| 2 | `top` | **271.2** | 43749 | 0.0052 | `imf-resnet-top-1cam-ph16-ex08-emb384-l12-ms50k-l20g4-20260405-125844` |
|
||||
| 3 | `r_vis + front` | **244.0** | 21874 | 0.0043 | `imf-resnet-frontrvis-2cam-ph16-ex08-emb384-l12-ms50k-l20g1-20260405-102029` |
|
||||
| 4 | `r_vis` | **6.4** | 17499 | 0.0047 | `imf-resnet-rvis-1cam-ph16-ex08-emb384-l12-ms50k-l20g3-20260405-125844` |
|
||||
| 5 | `r_vis + top` | **1.2** | 4374 | 0.0047 | `imf-resnet-rvistop-2cam-ph16-ex08-emb384-l12-ms50k-l20g2-20260405-125844` |
|
||||
| 6 | `front` | **0.0** | 4374 | 0.0074 | `imf-resnet-front-1cam-ph16-ex08-emb384-l12-ms50k-l20g0-20260405-095607` |
|
||||
|
||||
## Main takeaways
|
||||
|
||||
1. **`top` 是最关键的单相机视角**:`top only = 271.2`,几乎与 `top + front = 274.8` 持平。
|
||||
2. **`front` 单独几乎没有效用**:`front only = 0.0`。
|
||||
3. **`r_vis` 单独也基本无效**:`r_vis only = 6.4`。
|
||||
4. **`r_vis + front` 可以显著优于单独 `front` / `r_vis`**,说明这两个视角有一定互补性,但仍明显弱于任何包含 `top` 且表现正常的配置。
|
||||
5. **`r_vis + top` 的结果异常差**:只有 `1.2`,远低于 `top only = 271.2`。这说明简单加入 `r_vis` 并不保证增益,甚至可能破坏当前设置下的学习。
|
||||
6. **训练 loss 与 rollout reward 明显不一致**:例如 `r_vis + top` 和 `r_vis only` 的 final loss 都不高,但 reward 很差,因此本组实验必须以 rollout reward 而不是 loss 选型。
|
||||
|
||||
## Horizontal comparison views
|
||||
|
||||
### Single-camera comparison
|
||||
|
||||
- `top`: **271.2**
|
||||
- `r_vis`: **6.4**
|
||||
- `front`: **0.0**
|
||||
|
||||
结论:**`top >>> r_vis > front`**。
|
||||
|
||||
### Two-camera comparison
|
||||
|
||||
- `top + front`: **274.8**
|
||||
- `r_vis + front`: **244.0**
|
||||
- `r_vis + top`: **1.2**
|
||||
|
||||
结论:
|
||||
- **最稳妥的双相机组合是 `top + front`**。
|
||||
- `r_vis + front` 有效,但不如 `top + front`。
|
||||
- `r_vis + top` 在当前设置下几乎失效。
|
||||
|
||||
### Incremental effect of adding a second view
|
||||
|
||||
- 在 `top` 基础上加 `front`:`271.2 -> 274.8`,**增益很小**。
|
||||
- 在 `front` 基础上加 `r_vis`:`0.0 -> 244.0`,**增益很大**。
|
||||
- 在 `top` 基础上加 `r_vis`:`271.2 -> 1.2`,**显著退化**。
|
||||
|
||||
## Practical recommendation
|
||||
|
||||
如果只从这 6 个实验里选:
|
||||
|
||||
- **首选**:`top + front`
|
||||
- **次选**:`top only`
|
||||
- 如果必须不用 `top`:`r_vis + front` 明显优于 `front only` / `r_vis only`
|
||||
- **不建议**:`r_vis + top`
|
||||
|
||||
## Note relative to previous 3-camera baseline
|
||||
|
||||
此前 3 相机 `[r_vis, top, front]` 的最佳 reward 为 **610.8**。
|
||||
因此这次 6 个 camera ablation 的最佳结果(`top + front = 274.8`)说明:
|
||||
|
||||
- 当前这个训练批次里,**去掉任意一个视角都会显著低于之前的 3 相机最优结果**;
|
||||
- 但在去掉视角的约束下,**`top` 仍然是最核心的保留对象**。
|
||||
Reference in New Issue
Block a user