Files

Logic 06ac6c6d18 docs(plan): cover rollout entrypoint and eval regressions

2026-04-23 17:14:49 +08:00

14 KiB

Raw Blame History

sim_air_insert_ring_bar Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add an independent dual-Diana MuJoCo task sim_air_insert_ring_bar with a square ring block, a square bar block, staged rewards, strict finite-geometry in-air insertion success detection, and a task-specific scripted policy.

Architecture: Reuse the current dual-Diana EE-control stack and environment factory, but add a task-specific scene XML, robot asset entrypoint, sampling helpers, and a new task-specific environment module. Keep sim_transfer untouched while introducing pure-Python geometry helpers and focused tests so reward/success behavior can be regression tested without requiring a full MuJoCo rollout in every test.

Tech Stack: Python, unittest, MuJoCo XML assets, existing dual-Diana environment classes, Hydra-compatible task naming/config patterns.

File Structure / Responsibilities

Create: roboimi/assets/models/manipulators/DianaMed/ring_bar_objects.xml
- Defines the rigid ring body and bar body, each with a free joint and stable box-based geoms.
Create: roboimi/assets/models/manipulators/DianaMed/bi_diana_ring_bar_ee.xml
- Scene entrypoint that includes the shared world/table/robot assets plus the new object XML.
Modify: roboimi/assets/robots/diana_med.py
- Add a task-specific robot asset class for the new scene XML without changing existing BiDianaMed behavior.
Modify: roboimi/utils/act_ex_utils.py
- Add deterministic helpers to sample left/right planar placement regions for ring and bar objects.
Modify: roboimi/utils/constants.py
- Register the new task name and default metadata.
Create: roboimi/envs/double_air_insert_env.py
- New task-specific environment, finite-geometry success helpers, reset logic, reward logic, and task factory branch.
Modify: roboimi/envs/double_pos_ctrl_env.py
- Route make_sim_env() to the new task-specific environment while keeping current sim_transfer logic unchanged.
Create: roboimi/demos/diana_air_insert_policy.py
- Task-specific waypoint/open-loop scripted policy for grasp-lift-align-insert.
Modify: roboimi/demos/vla_scripts/eval_vla.py
- Reset the new task with the correct sampled task state instead of assuming a single transfer box pose.
Create: tests/test_air_insert_env.py
- Focused unit tests for sampling, reset helpers, reward progression, and strict success detection.
Modify: tests/test_eval_vla_headless.py
- Add coverage that headless evaluation dispatches the correct reset sampler for the new task.
Modify: tests/test_robot_asset_paths.py
- Verify the new robot asset class resolves its XML path correctly independent of cwd.

Task 1: Add failing tests for task registration, samplers, and asset wiring

Files:

Create: tests/test_air_insert_env.py
Modify: tests/test_eval_vla_headless.py
Modify: tests/test_robot_asset_paths.py
Modify: roboimi/utils/act_ex_utils.py (later in implementation)
Modify: roboimi/utils/constants.py (later in implementation)
Modify: roboimi/assets/robots/diana_med.py (later in implementation)
Modify: roboimi/envs/double_pos_ctrl_env.py (later in implementation)
Create: roboimi/envs/double_air_insert_env.py (minimal stub in this task)
Step 1: Write failing tests for task config and sampling helpers

Add tests in tests/test_air_insert_env.py covering:

SIM_TASK_CONFIGS['sim_air_insert_ring_bar'] exists
sample_air_insert_ring_bar_pose() (or equivalent helper) returns ring/bar positions with fixed z and correct left/right planar ranges
output structure is explicit and easy for reset/eval code to consume
Step 2: Write failing tests for environment factory dispatch and robot asset resolution

Add tests covering:

make_sim_env('sim_air_insert_ring_bar', headless=True) dispatches to the new environment with rendering disabled
a new robot asset class resolves the new XML path independent of cwd, similar to the existing BiDianaMed test pattern
Step 3: Write failing tests for eval reset helper dispatch

Extend tests/test_eval_vla_headless.py so headless eval can reset the new task using the new sampler instead of hard-coding sample_transfer_pose().

Step 4: Run the targeted tests to verify they fail for the expected missing-feature reasons

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env tests.test_eval_vla_headless tests.test_robot_asset_paths -v

Expected:

FAIL because the new task config/helper/class/dispatch branch does not exist yet
Step 5: Implement the minimal production code to satisfy the new task registration and helper tests

Implement only enough to make the new tests pass:

add new task config entry
add the new placement sampler
add the new robot asset class
create a minimal importable double_air_insert_env.py stub and class/function surface needed for factory dispatch tests
add the factory dispatch branch / headless wiring
update eval reset dispatch for the new task
Step 6: Re-run the targeted tests to verify they pass

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env tests.test_eval_vla_headless tests.test_robot_asset_paths -v

Expected:

PASS for the new registration/sampler/dispatch/asset tests
Step 7: Commit Task 1

Run: git add tests/test_air_insert_env.py tests/test_eval_vla_headless.py tests/test_robot_asset_paths.py roboimi/utils/act_ex_utils.py roboimi/utils/constants.py roboimi/assets/robots/diana_med.py roboimi/envs/double_pos_ctrl_env.py roboimi/envs/double_air_insert_env.py roboimi/demos/vla_scripts/eval_vla.py && git commit -m "feat(env): register sim air insert ring bar task"

Task 2: Add the MuJoCo ring+bar scene assets and reset helpers

Files:

Create: roboimi/assets/models/manipulators/DianaMed/ring_bar_objects.xml
Create: roboimi/assets/models/manipulators/DianaMed/bi_diana_ring_bar_ee.xml
Create or Modify: roboimi/envs/double_air_insert_env.py
Modify: tests/test_air_insert_env.py
Step 1: Write failing tests for object reset helpers and scene-specific joint naming assumptions

In tests/test_air_insert_env.py, add unit tests for helper functions that:

write ring pose to ring_block_joint from the named task-state mapping
write bar pose to bar_block_joint from the named task-state mapping
read back env_state as a stable 14D vector [ring_pos, ring_quat, bar_pos, bar_quat]

Use fake mj_data objects so tests stay fast and deterministic.

Step 2: Run the focused test slice and verify it fails

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

FAIL because reset/state helper functions and joint conventions are not implemented yet
Step 3: Implement the scene XML files and reset/state helper code

Implement:

the object XML with one rigid ring body and one rigid bar body
the task scene XML entrypoint using the shared world/table/robot includes
reset helper(s) in double_air_insert_env.py that set qpos for both free joints with fixed quaternions
task-state accessor(s) returning both object poses in a stable structure
Step 4: Re-run the focused test slice and verify it passes

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

PASS for reset/state helper tests
Step 5: Commit Task 2

Run: git add roboimi/assets/models/manipulators/DianaMed/ring_bar_objects.xml roboimi/assets/models/manipulators/DianaMed/bi_diana_ring_bar_ee.xml roboimi/envs/double_air_insert_env.py tests/test_air_insert_env.py && git commit -m "feat(scene): add ring and bar insertion scene assets"

Task 3: Implement strict reward and finite-geometry success detection

Files:

Modify: roboimi/envs/double_air_insert_env.py
Modify: tests/test_air_insert_env.py
Step 1: Write failing tests for reward stages and strict success detection

Add tests in tests/test_air_insert_env.py for:

left contact stage reward
right contact stage reward
ring lifted off table stage
bar lifted off table stage
positive success case where a finite bar truly passes through the aperture
negative case where the centerline would pass but the finite square body would clip
negative case where the bar has not crossed the ring thickness direction enough
negative case where one/both objects are still on the table

Structure the tests around pure helper functions and light fake contact/state objects so the geometry logic is directly regression tested.

Step 2: Run the focused tests and verify they fail for missing reward/success logic

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

FAIL because the staged reward and finite-geometry insertion logic are not implemented yet
Step 3: Implement minimal strict success helpers and reward logic

Implement in roboimi/envs/double_air_insert_env.py:

pure helper(s) for transforming bar geometry into ring-local coordinates
finite-geometry insertion predicate (not centerline-only)
table-contact / airborne checks
staged reward function returning the highest achieved stage with max_reward = 5
Step 4: Re-run the focused tests to verify the logic passes

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

PASS for reward and success-detection regression tests
Step 5: Commit Task 3

Run: git add roboimi/envs/double_air_insert_env.py tests/test_air_insert_env.py && git commit -m "feat(env): add strict air insertion reward and success logic"

Task 4: Add the scripted policy and integration smoke coverage

Files:

Create: roboimi/demos/diana_air_insert_policy.py
Modify: roboimi/demos/diana_record_sim_episodes.py
Modify: tests/test_air_insert_env.py
Optionally Modify: roboimi/demos/vla_scripts/eval_vla.py (only if integration gaps remain after Task 1)
Step 1: Write failing tests for scripted-policy action shape and basic generation

Add tests covering:

the new policy produces a 16D action
trajectory generation accepts sampled named task state without error
the first action is a valid open-gripper safe pose command
a deterministic nominal smoke path (with canonical sampled state or fake env shim) reaches the intended terminal interface contract without shape/reward mismatches

Keep the tests unit-level; do not require a full MuJoCo rollout for every assertion.

Step 2: Write failing tests for the scripted rollout entrypoint and a real headless smoke path

Add coverage for both:

the standard scripted rollout entrypoint (roboimi/demos/diana_record_sim_episodes.py) can select the new task sampler/policy instead of remaining sim_transfer-only
a deterministic integration/smoke test that instantiates make_sim_env('sim_air_insert_ring_bar', headless=True), resets with sampled named task state, and steps a few actions or scripted-policy outputs using the real task XML and task-specific wiring
Step 3: Run the scripted-policy tests and verify they fail

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

FAIL because the new scripted policy does not exist yet
Step 4: Implement the waypoint-based scripted policy

Implement a conservative open-loop policy with phases:

safe wait pose
above-target approach
descend + grasp
dual lift
airborne meeting alignment
bar push-through insertion

Use fixed orientations for version 1 and follow the existing repository style from diana_policy.py.

Step 5: Re-run the scripted-policy tests to verify they pass

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env -v

Expected:

PASS for scripted-policy tests
Step 6: Run the combined verification suite for this feature

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env tests.test_eval_vla_headless tests.test_eval_vla_rollout_artifacts tests.test_train_vla_rollout_validation tests.test_robot_asset_paths -v

Expected:

PASS with 0 failures
Step 6b: Run the mandatory real headless smoke check

Run a focused smoke command that instantiates the real task, resets with sampled state, and steps a few actions using the new scripted policy or a deterministic action sequence.

Example command (adjust module/test helper if needed): /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env.AirInsertEnvSmokeTest -v

Expected:

PASS, proving the real XML/assets/env wiring instantiate and step correctly in headless mode
Step 7: Commit Task 4

Run: git add roboimi/demos/diana_air_insert_policy.py tests/test_air_insert_env.py tests/test_eval_vla_headless.py tests/test_robot_asset_paths.py roboimi/demos/vla_scripts/eval_vla.py && git commit -m "feat(policy): add scripted air insertion policy"

Task 5: Final verification and implementation review

Files:

Review all files touched above
Step 1: Run fresh end-to-end verification before claiming completion

Run: /home/droid/.conda/envs/roboimi/bin/python -m unittest tests.test_air_insert_env tests.test_eval_vla_headless tests.test_robot_asset_paths -v

Expected:

PASS with 0 failures
Step 2: Inspect git status and recent commits

Run: git status --short && git log --oneline --decorate -n 8

Expected:

only intended feature files modified / committed
Step 3: Request final code review for the completed feature

Use the requesting-code-review skill against the full diff from the feature branch starting point to current HEAD.

Step 4: Address any review findings and re-run verification if code changes

If fixes are made, repeat the unittest command from Step 1.

Step 5: Hand off using finishing-a-development-branch

After verification and review, use the finishing-a-development-branch skill to decide merge / PR / cleanup.

14 KiB Raw Blame History

sim_air_insert_ring_bar Implementation Plan

File Structure / Responsibilities

Task 1: Add failing tests for task registration, samplers, and asset wiring

Task 2: Add the MuJoCo ring+bar scene assets and reset helpers

Task 3: Implement strict reward and finite-geometry success detection

Task 4: Add the scripted policy and integration smoke coverage

Task 5: Final verification and implementation review

14 KiB

Raw Blame History