roboimi/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md

# sim_air_insert_ring_bar Design

## Summary

Add a new independent MuJoCo simulation task named `sim_air_insert_ring_bar` that keeps the existing dual-Diana tabletop setup but replaces the single transfer box with two randomized objects:

- a square ring block grasped by the left arm
- a square bar block grasped by the right arm

The task is to pick both objects off the table and complete an in-air insertion where the bar truly passes through the ring aperture. The existing `sim_transfer` task must remain unchanged.

## Goals

- Reuse the current dual-Diana EE-control simulation stack
- Keep the same table/base robot arrangement as the existing transfer task
- Add an independent task entrypoint and scene definition
- Randomize planar placement of both objects within left/right task-specific regions
- Implement reward staging for contact, lift, and successful in-air insertion
- Add a scripted policy that performs pick, lift, align, and in-air insertion
- Preserve compatibility with existing environment creation, evaluation, and rollout patterns

## Non-Goals

- No random yaw in the first version
- No visual servoing or closed-loop insertion controller
- No general multi-task environment framework refactor
- No guarantee that the VLA training stack is immediately tuned for this new task
- No replacement or behavior change for `sim_transfer`

## Task Name

Use a new task name:

- `sim_air_insert_ring_bar`

This task should be exposed alongside `sim_transfer`, not as a replacement.

## Scene Geometry

### Shared Base Scene

Keep the dual Diana robot, the table, and the existing camera layout conceptually unchanged.

### Ring Block

Represent the square ring as a rigid free body composed from simple MuJoCo box geoms rather than an external mesh.

Dimensions:

- outer side length: 68 mm
- inner aperture side length: 32 mm
- thickness: 18 mm
- ring wall width: 18 mm

The ring should behave as a single object body with a single free joint.

### Bar Block

Represent the bar as a rigid free body with a single box geom.

Dimensions:

- length: 90 mm
- cross-section: 18 mm x 18 mm

The bar should also be a single free-joint body.

## Initial Placement / Reset

The first version uses position-only randomization with fixed orientation.

- ring block: randomized only in a left-side planar sampling region
- bar block: randomized only in a right-side planar sampling region
- both objects start flat on the table
- both objects use fixed orientation at reset
- no random yaw, tilt, or flip in this version

The sampling regions should be chosen conservatively so that:

- the left arm can comfortably reach and grasp the ring
- the right arm can comfortably reach and grasp the bar
- scripted open-loop pick trajectories remain feasible

## Control / Action Interface

Reuse the current 16D EE-space action convention already used by the dual-Diana position-control environment:

- left arm EE pose: 7D (`xyz + quat`)
- right arm EE pose: 7D (`xyz + quat`)
- left gripper command: 1D
- right gripper command: 1D

The new task should continue using EE targets transformed through the existing IK-based control path.

## Environment Structure

Implement this as a new task-specific environment path while reusing the existing dual-Diana simulation base where possible.

Expected responsibilities:

- scene instantiation for the ring+bar setup
- task reset for randomized object placement
- environment-state accessors for both objects
- reward computation
- in-air insertion success detection

The environment factory must dispatch by task name and leave the `sim_transfer` branch unchanged.

## Observation / Environment State

The task should retain the current observation structure style used by the dual-Diana environment:

- `qpos`
- multi-camera images

For task state access, the environment should expose at least the pose information needed to reason about both objects:

- ring position
- ring orientation if needed for insertion checks / debugging
- bar position
- bar orientation if needed for insertion checks / debugging

This state should be sufficient for scripted-policy debugging and future rollout analysis.

## Reward Design

Use staged rewards in the same spirit as the current task, returning the highest achieved stage rather than accumulating one-time sparse bonuses per event.

Maximum reward:

- `max_reward = 5`

Reward stages:

1. left gripper touches the ring block
2. right gripper touches the bar block
3. ring block is lifted off the table
4. bar block is lifted off the table
5. while both objects are off the table, the bar truly passes through the ring aperture

Notes:

- contact rewards are intended as grasp-progress stages
- lift rewards require the object to be off the table, not merely touched
- final success reward only applies when both objects are airborne

## Success Detection

Success must **not** be based on a centerline-only check.

A centerline-only test is insufficient because:

- the bar has thickness, so a centerline can pass through while the body cannot
- a square bar with imperfect orientation can have its centerline inside the aperture while its corners still collide with the ring

### Required Success Semantics

A successful insertion requires all of the following:

1. the ring is off the table
2. the bar is off the table
3. the bar has actually crossed through the ring thickness direction
4. the bar’s finite square cross-section fits through the square aperture during that crossing

### Recommended Detection Approach

Use a task-level geometric check in Python rather than relying on contact alone.

Implementation intent:

- transform the bar geometry into the ring’s local frame
- reason about the bar as a finite oriented box (not a line)
- verify that the bar has crossed the ring thickness direction
- verify that the portion of the bar passing the aperture fits within the inner square opening, accounting for the bar’s cross-section and orientation

This geometric check is the primary success test.

### Role of Contacts

Contacts may still be used for:

- grasp-stage rewards
- debugging / diagnostics

But contact alone should **not** be the sole criterion for insertion success, since:

- a true clean insertion may have limited aperture-wall contact
- persistent contact can also happen while the bar is jammed and not actually inserted

## Scripted Policy

Add a new task-specific scripted policy for `sim_air_insert_ring_bar`.

### Policy Intent

The first version prioritizes a conservative, reliable open-loop demonstration rather than an optimized trajectory.

### Action Phases

The scripted policy should follow these phases:

1. move both arms to safe initial / waiting poses with grippers open
2. move left arm above the ring and right arm above the bar
3. descend and grasp the assigned objects
4. lift both objects clear of the table
5. move both objects to an airborne meeting region above the table
6. hold the ring stably while aligning the bar with the aperture
7. push the bar along the intended insertion direction until the geometric success condition is met

### Grasp Assignment

- left arm: ring only
- right arm: bar only

### Motion Style

Keep the current repository style:

- waypoint-based trajectory definition
- open-loop interpolation between waypoints
- fixed grasp orientation in the first version

No adaptive replanning is required for the first version.

## Files / Integration Scope

The implementation is expected to add task-specific files rather than broadly refactoring the codebase.

Likely additions / changes:

- a new MuJoCo scene XML for the ring+bar task
- one or more XML fragments defining the two new objects
- a new task-specific dual-Diana environment file
- robot asset wiring for the new scene XML
- reset sampling helpers for the new task
- task registration in constants / environment factory paths
- a new scripted policy file
- focused tests for task creation, reset, rewards, success detection, and scripted policy shape/smoke behavior

## Testing Requirements

At minimum, add regression coverage for:

### Environment Creation

- the new task can be created via the task factory
- the existing `sim_transfer` task remains unchanged

### Reset / Sampling

- ring reset positions are inside the left sampling region
- bar reset positions are inside the right sampling region
- reset orientation is fixed as intended

### Environment State

- environment-state access returns both object poses in the expected structure

### Success Detection

Must include both positive and negative cases.

Positive case:

- a configuration where the finite bar truly passes through the ring aperture is detected as success

Negative cases:

- centerline-inside but finite body would clip the aperture
- not enough depth / not actually crossing the ring thickness direction
- one or both objects still on the table

### Reward Logic

- left contact stage
- right contact stage
- ring lift stage
- bar lift stage
- final success stage with `max_reward = 5`

### Scripted Policy

At minimum:

- policy emits valid 16D actions
- trajectory generation does not error
- rollout smoke path can step through the new environment

## Risks / Constraints

- MuJoCo contact naming must remain stable enough for stage rewards
- geometric insertion checks must be strict enough to avoid false positives but not so brittle that numerically valid insertions are missed
- scripted open-loop insertion may require conservative alignment and lift heights to keep the first version reliable

## Acceptance Criteria

The feature is complete when all of the following are true:

- `sim_air_insert_ring_bar` is creatable as an independent task
- the scene contains the dual Diana, table, ring block, and bar block
- reset randomizes ring and bar positions in left/right planar regions with fixed orientation
- the environment exposes task state for both objects
- staged rewards progress to `max_reward = 5`
- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
- existing `sim_transfer` behavior is preserved