Files
roboimi/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md
2026-04-23 16:47:05 +08:00

10 KiB
Raw Permalink Blame History

sim_air_insert_ring_bar Design

Summary

Add a new independent MuJoCo simulation task named sim_air_insert_ring_bar that keeps the existing dual-Diana tabletop setup but replaces the single transfer box with two randomized objects:

  • a square ring block grasped by the left arm
  • a square bar block grasped by the right arm

The task is to pick both objects off the table and complete an in-air insertion where the bar truly passes through the ring aperture. The existing sim_transfer task must remain unchanged.

Goals

  • Reuse the current dual-Diana EE-control simulation stack
  • Keep the same table/base robot arrangement as the existing transfer task
  • Add an independent task entrypoint and scene definition
  • Randomize planar placement of both objects within left/right task-specific regions
  • Implement reward staging for contact, lift, and successful in-air insertion
  • Add a scripted policy that performs pick, lift, align, and in-air insertion
  • Preserve compatibility with existing environment creation, evaluation, and rollout patterns

Non-Goals

  • No random yaw in the first version
  • No visual servoing or closed-loop insertion controller
  • No general multi-task environment framework refactor
  • No guarantee that the VLA training stack is immediately tuned for this new task
  • No replacement or behavior change for sim_transfer

Task Name

Use a new task name:

  • sim_air_insert_ring_bar

This task should be exposed alongside sim_transfer, not as a replacement.

Scene Geometry

Shared Base Scene

Keep the dual Diana robot, the table, and the existing camera layout conceptually unchanged.

Ring Block

Represent the square ring as a rigid free body composed from simple MuJoCo box geoms rather than an external mesh.

Dimensions:

  • outer side length: 68 mm
  • inner aperture side length: 32 mm
  • thickness: 18 mm
  • ring wall width: 18 mm

The ring should behave as a single object body with a single free joint.

Bar Block

Represent the bar as a rigid free body with a single box geom.

Dimensions:

  • length: 90 mm
  • cross-section: 18 mm x 18 mm

The bar should also be a single free-joint body.

Initial Placement / Reset

The first version uses position-only randomization with fixed orientation. Reset sampling stays caller-driven, matching the existing sim_transfer usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into env.reset(...).

Use an explicit sampled task-state structure with named fields:

  • ring_pos: 3D position
  • ring_quat: fixed 4D quaternion for version 1
  • bar_pos: 3D position
  • bar_quat: fixed 4D quaternion for version 1

Behavior:

  • ring block: randomized only in a left-side planar sampling region
  • bar block: randomized only in a right-side planar sampling region
  • both objects start flat on the table
  • both objects use fixed orientation at reset
  • no random yaw, tilt, or flip in this version

The sampling regions should be chosen conservatively so that:

  • the left arm can comfortably reach and grasp the ring
  • the right arm can comfortably reach and grasp the bar
  • scripted open-loop pick trajectories remain feasible

Control / Action Interface

Reuse the current 16D EE-space action convention already used by the dual-Diana position-control environment:

  • left arm EE pose: 7D (xyz + quat)
  • right arm EE pose: 7D (xyz + quat)
  • left gripper command: 1D
  • right gripper command: 1D

The new task should continue using EE targets transformed through the existing IK-based control path.

Environment Structure

Implement this as a new task-specific environment path while reusing the existing dual-Diana simulation base where possible.

Expected responsibilities:

  • scene instantiation for the ring+bar setup
  • task reset for randomized object placement
  • environment-state accessors for both objects
  • reward computation
  • in-air insertion success detection

The environment factory must dispatch by task name and leave the sim_transfer branch unchanged.

Observation / Environment State

The task should retain the current observation structure style used by the dual-Diana environment:

  • qpos
  • multi-camera images

For task state access, the environment should expose a stable env_state vector with this exact order:

  • ring_pos[0:3]
  • ring_quat[3:7]
  • bar_pos[7:10]
  • bar_quat[10:14]

This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above.

Reward Design

Use staged rewards in the same spirit as the current task, returning the highest achieved stage rather than accumulating one-time sparse bonuses per event.

Maximum reward:

  • max_reward = 5

Reward stages:

  1. left gripper touches the ring block
  2. right gripper touches the bar block
  3. ring block is lifted off the table
  4. bar block is lifted off the table
  5. while both objects are off the table, the bar truly passes through the ring aperture

Notes:

  • contact rewards are intended as grasp-progress stages
  • lift rewards require the object to be off the table, not merely touched
  • final success reward only applies when both objects are airborne

Success Detection

Success must not be based on a centerline-only check.

A centerline-only test is insufficient because:

  • the bar has thickness, so a centerline can pass through while the body cannot
  • a square bar with imperfect orientation can have its centerline inside the aperture while its corners still collide with the ring

Required Success Semantics

A successful insertion requires all of the following:

  1. the ring is off the table
  2. the bar is off the table
  3. the bar has actually crossed through the ring thickness direction
  4. the bars finite square cross-section fits through the square aperture during that crossing

Use a task-level geometric check in Python rather than relying on contact alone.

Implementation intent:

  • transform the bar geometry into the rings local frame
  • reason about the bar as a finite oriented box (not a line)
  • verify that the bar has crossed the ring thickness direction
  • verify that the portion of the bar passing the aperture fits within the inner square opening, accounting for the bars cross-section and orientation

This geometric check is the primary success test.

Role of Contacts

Contacts may still be used for:

  • grasp-stage rewards
  • debugging / diagnostics

But contact alone should not be the sole criterion for insertion success, since:

  • a true clean insertion may have limited aperture-wall contact
  • persistent contact can also happen while the bar is jammed and not actually inserted

Scripted Policy

Add a new task-specific scripted policy for sim_air_insert_ring_bar.

Policy Intent

The first version prioritizes a conservative, reliable open-loop demonstration rather than an optimized trajectory.

Action Phases

The scripted policy should follow these phases:

  1. move both arms to safe initial / waiting poses with grippers open
  2. move left arm above the ring and right arm above the bar
  3. descend and grasp the assigned objects
  4. lift both objects clear of the table
  5. move both objects to an airborne meeting region above the table
  6. hold the ring stably while aligning the bar with the aperture
  7. push the bar along the intended insertion direction until the geometric success condition is met

Grasp Assignment

  • left arm: ring only
  • right arm: bar only

Motion Style

Keep the current repository style:

  • waypoint-based trajectory definition
  • open-loop interpolation between waypoints
  • fixed grasp orientation in the first version

No adaptive replanning is required for the first version.

Files / Integration Scope

The implementation is expected to add task-specific files rather than broadly refactoring the codebase.

Likely additions / changes:

  • a new MuJoCo scene XML for the ring+bar task
  • one or more XML fragments defining the two new objects
  • a new task-specific dual-Diana environment file
  • robot asset wiring for the new scene XML
  • reset sampling helpers for the new task
  • task registration in constants / environment factory paths
  • a new scripted policy file
  • focused tests for task creation, reset, rewards, success detection, and scripted policy shape/smoke behavior

Testing Requirements

At minimum, add regression coverage for:

Environment Creation

  • the new task can be created via the task factory
  • the existing sim_transfer task remains unchanged

Reset / Sampling

  • ring reset positions are inside the left sampling region
  • bar reset positions are inside the right sampling region
  • reset orientation is fixed as intended

Environment State

  • environment-state access returns both object poses in the expected structure

Success Detection

Must include both positive and negative cases.

Positive case:

  • a configuration where the finite bar truly passes through the ring aperture is detected as success

Negative cases:

  • centerline-inside but finite body would clip the aperture
  • not enough depth / not actually crossing the ring thickness direction
  • one or both objects still on the table

Reward Logic

  • left contact stage
  • right contact stage
  • ring lift stage
  • bar lift stage
  • final success stage with max_reward = 5

Scripted Policy

At minimum:

  • policy emits valid 16D actions
  • trajectory generation does not error
  • rollout smoke path can step through the new environment

Risks / Constraints

  • MuJoCo contact naming must remain stable enough for stage rewards
  • geometric insertion checks must be strict enough to avoid false positives but not so brittle that numerically valid insertions are missed
  • scripted open-loop insertion may require conservative alignment and lift heights to keep the first version reliable

Acceptance Criteria

The feature is complete when all of the following are true:

  • sim_air_insert_ring_bar is creatable as an independent task
  • the scene contains the dual Diana, table, ring block, and bar block
  • reset randomizes ring and bar positions in left/right planar regions with fixed orientation
  • the environment exposes task state for both objects
  • staged rewards progress to max_reward = 5
  • final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
  • a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
  • a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation
  • existing sim_transfer behavior is preserved