Explore Help

JiajunLI/roboimi

1

0

You've already forked roboimi

Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity

Files

ff7c9c1f2ae79fd8d546881e962393e03650efa2

roboimi/docs/superpowers/plans/2026-04-06-resnet-multitoken-imf.md

Logic ff7c9c1f2a feat: add vision transfer backbones and IMF variants

2026-04-09 14:02:24 +08:00

4.0 KiB

Raw Blame History

ResNet Multitoken IMF Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Implement a standard-ResNet-18 multiview IMF variant that emits three condition tokens per obs step and launch four L20 experiments for n_emb in {256,384} and n_layer in {12,16}.

Architecture: The ResNet backbone will optionally return one token per camera instead of concatenating all cameras into one token. VLAAgent will pair each camera token with the current state, project each pair into a condition token, flatten the per-step camera tokens into one cond sequence, and feed that sequence into the existing IMF/AttnRes head.

Tech Stack: PyTorch, torchvision ResNet-18, Hydra, pytest, SwanLab, SSH/Tailscale.

Task 1: Add failing tests for multi-token conditioning

Files:

Modify: tests/test_imf_vla_agent.py
Modify: tests/test_resnet_transformer_agent_wiring.py
Step 1: Add a direct agent test
- Stub a vision backbone returning (B,T,3,D) and assert _build_cond() yields (B, T*3, D_cond).
- Assert state is paired with each camera token, not concatenated across cameras first.
Step 2: Add Hydra wiring test
- Instantiate a new agent=resnet_imf_attnres_multitoken config with small dims.
- Assert condition_tokens_per_step == 3, condition_sequence_length == obs_horizon * 3, and head n_obs_steps receives that sequence length.
Step 3: Run focused tests and verify RED
- python -m pytest tests/test_imf_vla_agent.py tests/test_resnet_transformer_agent_wiring.py -q

Task 2: Implement multi-token ResNet conditioning path

Files:

Modify: roboimi/vla/models/backbones/resnet_diffusion.py
Modify: roboimi/vla/agent.py
Create: roboimi/vla/conf/agent/resnet_imf_attnres_multitoken.yaml
Step 1: Extend ResNet backbone
- Add an opt-in flag to return (B,T,num_cams,D) camera tokens instead of one concatenated (B,T,num_cams*D) token.
- Keep standard ResNet-18 vision mode; do not switch to AttnRes vision.
Step 2: Extend VLAAgent condition building
- Support visual features with rank 4 (B,T,K,D).
- Broadcast state to (B,T,K,D_state), concatenate per camera, apply projector per token, then flatten to (B,T*K,D_cond).
- Track condition_tokens_per_step and condition_sequence_length.
Step 3: Update transformer-head instantiation
- Pass n_obs_steps=condition_sequence_length when building transformer heads.
Step 4: Add Hydra config
- New agent config uses:
  - separate ResNet-18 per camera
  - standard residual vision trunk (vision_backbone_mode=resnet)
  - condition projector output dim tied to ${agent.head.n_emb}
  - rollout episodes 10, pred_horizon=16, num_action_steps=8

Task 3: Verify locally

Files:

Modify only if verification reveals issues
Step 1: Run focused tests and make them pass
- python -m pytest tests/test_imf_vla_agent.py tests/test_resnet_transformer_agent_wiring.py -q
Step 2: Run regression subset
- python -m pytest tests/test_eval_vla_headless.py tests/test_train_vla_rollout_validation.py tests/test_simple_robot_dataset_image_loading.py -q
Step 3: Run local smoke instantiation
- instantiate the new Hydra config and verify cond shape / sequence length

Task 4: Launch 4 L20 experiments

Files:

Remote repo copy under /home/droid/roboimi_suite_20260404
Step 1: Sync code to 100.119.99.14
Step 2: Smoke the new config on remote
Step 3: Launch runs
- (n_emb=256, n_layer=12)
- (n_emb=256, n_layer=16)
- (n_emb=384, n_layer=12)
- (n_emb=384, n_layer=16)
Step 4: Keep fixed across runs
- rollout episodes 10
- pred_horizon=16
- num_action_steps=8
- standard ResNet-18 vision trunk
- three separate camera weights
Step 5: Record PIDs, GPUs, log paths, SwanLab URLs

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.25.3 Page: 47ms Template: 4ms

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API