diff --git a/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md b/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md new file mode 100644 index 0000000..52d6cda --- /dev/null +++ b/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md @@ -0,0 +1,306 @@ +# sim_air_insert_ring_bar Design + +## Summary + +Add a new independent MuJoCo simulation task named `sim_air_insert_ring_bar` that keeps the existing dual-Diana tabletop setup but replaces the single transfer box with two randomized objects: + +- a square ring block grasped by the left arm +- a square bar block grasped by the right arm + +The task is to pick both objects off the table and complete an in-air insertion where the bar truly passes through the ring aperture. The existing `sim_transfer` task must remain unchanged. + +## Goals + +- Reuse the current dual-Diana EE-control simulation stack +- Keep the same table/base robot arrangement as the existing transfer task +- Add an independent task entrypoint and scene definition +- Randomize planar placement of both objects within left/right task-specific regions +- Implement reward staging for contact, lift, and successful in-air insertion +- Add a scripted policy that performs pick, lift, align, and in-air insertion +- Preserve compatibility with existing environment creation, evaluation, and rollout patterns + +## Non-Goals + +- No random yaw in the first version +- No visual servoing or closed-loop insertion controller +- No general multi-task environment framework refactor +- No guarantee that the VLA training stack is immediately tuned for this new task +- No replacement or behavior change for `sim_transfer` + +## Task Name + +Use a new task name: + +- `sim_air_insert_ring_bar` + +This task should be exposed alongside `sim_transfer`, not as a replacement. + +## Scene Geometry + +### Shared Base Scene + +Keep the dual Diana robot, the table, and the existing camera layout conceptually unchanged. + +### Ring Block + +Represent the square ring as a rigid free body composed from simple MuJoCo box geoms rather than an external mesh. + +Dimensions: + +- outer side length: 68 mm +- inner aperture side length: 32 mm +- thickness: 18 mm +- ring wall width: 18 mm + +The ring should behave as a single object body with a single free joint. + +### Bar Block + +Represent the bar as a rigid free body with a single box geom. + +Dimensions: + +- length: 90 mm +- cross-section: 18 mm x 18 mm + +The bar should also be a single free-joint body. + +## Initial Placement / Reset + +The first version uses position-only randomization with fixed orientation. + +- ring block: randomized only in a left-side planar sampling region +- bar block: randomized only in a right-side planar sampling region +- both objects start flat on the table +- both objects use fixed orientation at reset +- no random yaw, tilt, or flip in this version + +The sampling regions should be chosen conservatively so that: + +- the left arm can comfortably reach and grasp the ring +- the right arm can comfortably reach and grasp the bar +- scripted open-loop pick trajectories remain feasible + +## Control / Action Interface + +Reuse the current 16D EE-space action convention already used by the dual-Diana position-control environment: + +- left arm EE pose: 7D (`xyz + quat`) +- right arm EE pose: 7D (`xyz + quat`) +- left gripper command: 1D +- right gripper command: 1D + +The new task should continue using EE targets transformed through the existing IK-based control path. + +## Environment Structure + +Implement this as a new task-specific environment path while reusing the existing dual-Diana simulation base where possible. + +Expected responsibilities: + +- scene instantiation for the ring+bar setup +- task reset for randomized object placement +- environment-state accessors for both objects +- reward computation +- in-air insertion success detection + +The environment factory must dispatch by task name and leave the `sim_transfer` branch unchanged. + +## Observation / Environment State + +The task should retain the current observation structure style used by the dual-Diana environment: + +- `qpos` +- multi-camera images + +For task state access, the environment should expose at least the pose information needed to reason about both objects: + +- ring position +- ring orientation if needed for insertion checks / debugging +- bar position +- bar orientation if needed for insertion checks / debugging + +This state should be sufficient for scripted-policy debugging and future rollout analysis. + +## Reward Design + +Use staged rewards in the same spirit as the current task, returning the highest achieved stage rather than accumulating one-time sparse bonuses per event. + +Maximum reward: + +- `max_reward = 5` + +Reward stages: + +1. left gripper touches the ring block +2. right gripper touches the bar block +3. ring block is lifted off the table +4. bar block is lifted off the table +5. while both objects are off the table, the bar truly passes through the ring aperture + +Notes: + +- contact rewards are intended as grasp-progress stages +- lift rewards require the object to be off the table, not merely touched +- final success reward only applies when both objects are airborne + +## Success Detection + +Success must **not** be based on a centerline-only check. + +A centerline-only test is insufficient because: + +- the bar has thickness, so a centerline can pass through while the body cannot +- a square bar with imperfect orientation can have its centerline inside the aperture while its corners still collide with the ring + +### Required Success Semantics + +A successful insertion requires all of the following: + +1. the ring is off the table +2. the bar is off the table +3. the bar has actually crossed through the ring thickness direction +4. the bar’s finite square cross-section fits through the square aperture during that crossing + +### Recommended Detection Approach + +Use a task-level geometric check in Python rather than relying on contact alone. + +Implementation intent: + +- transform the bar geometry into the ring’s local frame +- reason about the bar as a finite oriented box (not a line) +- verify that the bar has crossed the ring thickness direction +- verify that the portion of the bar passing the aperture fits within the inner square opening, accounting for the bar’s cross-section and orientation + +This geometric check is the primary success test. + +### Role of Contacts + +Contacts may still be used for: + +- grasp-stage rewards +- debugging / diagnostics + +But contact alone should **not** be the sole criterion for insertion success, since: + +- a true clean insertion may have limited aperture-wall contact +- persistent contact can also happen while the bar is jammed and not actually inserted + +## Scripted Policy + +Add a new task-specific scripted policy for `sim_air_insert_ring_bar`. + +### Policy Intent + +The first version prioritizes a conservative, reliable open-loop demonstration rather than an optimized trajectory. + +### Action Phases + +The scripted policy should follow these phases: + +1. move both arms to safe initial / waiting poses with grippers open +2. move left arm above the ring and right arm above the bar +3. descend and grasp the assigned objects +4. lift both objects clear of the table +5. move both objects to an airborne meeting region above the table +6. hold the ring stably while aligning the bar with the aperture +7. push the bar along the intended insertion direction until the geometric success condition is met + +### Grasp Assignment + +- left arm: ring only +- right arm: bar only + +### Motion Style + +Keep the current repository style: + +- waypoint-based trajectory definition +- open-loop interpolation between waypoints +- fixed grasp orientation in the first version + +No adaptive replanning is required for the first version. + +## Files / Integration Scope + +The implementation is expected to add task-specific files rather than broadly refactoring the codebase. + +Likely additions / changes: + +- a new MuJoCo scene XML for the ring+bar task +- one or more XML fragments defining the two new objects +- a new task-specific dual-Diana environment file +- robot asset wiring for the new scene XML +- reset sampling helpers for the new task +- task registration in constants / environment factory paths +- a new scripted policy file +- focused tests for task creation, reset, rewards, success detection, and scripted policy shape/smoke behavior + +## Testing Requirements + +At minimum, add regression coverage for: + +### Environment Creation + +- the new task can be created via the task factory +- the existing `sim_transfer` task remains unchanged + +### Reset / Sampling + +- ring reset positions are inside the left sampling region +- bar reset positions are inside the right sampling region +- reset orientation is fixed as intended + +### Environment State + +- environment-state access returns both object poses in the expected structure + +### Success Detection + +Must include both positive and negative cases. + +Positive case: + +- a configuration where the finite bar truly passes through the ring aperture is detected as success + +Negative cases: + +- centerline-inside but finite body would clip the aperture +- not enough depth / not actually crossing the ring thickness direction +- one or both objects still on the table + +### Reward Logic + +- left contact stage +- right contact stage +- ring lift stage +- bar lift stage +- final success stage with `max_reward = 5` + +### Scripted Policy + +At minimum: + +- policy emits valid 16D actions +- trajectory generation does not error +- rollout smoke path can step through the new environment + +## Risks / Constraints + +- MuJoCo contact naming must remain stable enough for stage rewards +- geometric insertion checks must be strict enough to avoid false positives but not so brittle that numerically valid insertions are missed +- scripted open-loop insertion may require conservative alignment and lift heights to keep the first version reliable + +## Acceptance Criteria + +The feature is complete when all of the following are true: + +- `sim_air_insert_ring_bar` is creatable as an independent task +- the scene contains the dual Diana, table, ring block, and bar block +- reset randomizes ring and bar positions in left/right planar regions with fixed orientation +- the environment exposes task state for both objects +- staged rewards progress to `max_reward = 5` +- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut +- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment +- existing `sim_transfer` behavior is preserved