Files

Logic 636290d36a docs: clarify ring bar task state contracts

2026-04-23 16:47:05 +08:00

10 KiB

Raw Permalink Blame History

sim_air_insert_ring_bar Design

Summary

Add a new independent MuJoCo simulation task named sim_air_insert_ring_bar that keeps the existing dual-Diana tabletop setup but replaces the single transfer box with two randomized objects:

a square ring block grasped by the left arm
a square bar block grasped by the right arm

The task is to pick both objects off the table and complete an in-air insertion where the bar truly passes through the ring aperture. The existing sim_transfer task must remain unchanged.

Goals

Reuse the current dual-Diana EE-control simulation stack
Keep the same table/base robot arrangement as the existing transfer task
Add an independent task entrypoint and scene definition
Randomize planar placement of both objects within left/right task-specific regions
Implement reward staging for contact, lift, and successful in-air insertion
Add a scripted policy that performs pick, lift, align, and in-air insertion
Preserve compatibility with existing environment creation, evaluation, and rollout patterns

Non-Goals

No random yaw in the first version
No visual servoing or closed-loop insertion controller
No general multi-task environment framework refactor
No guarantee that the VLA training stack is immediately tuned for this new task
No replacement or behavior change for sim_transfer

Task Name

Use a new task name:

sim_air_insert_ring_bar

This task should be exposed alongside sim_transfer, not as a replacement.

Scene Geometry

Shared Base Scene

Keep the dual Diana robot, the table, and the existing camera layout conceptually unchanged.

Ring Block

Represent the square ring as a rigid free body composed from simple MuJoCo box geoms rather than an external mesh.

Dimensions:

outer side length: 68 mm
inner aperture side length: 32 mm
thickness: 18 mm
ring wall width: 18 mm

The ring should behave as a single object body with a single free joint.

Bar Block

Represent the bar as a rigid free body with a single box geom.

Dimensions:

length: 90 mm
cross-section: 18 mm x 18 mm

The bar should also be a single free-joint body.

Initial Placement / Reset

The first version uses position-only randomization with fixed orientation. Reset sampling stays caller-driven, matching the existing sim_transfer usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into env.reset(...).

Use an explicit sampled task-state structure with named fields:

ring_pos: 3D position
ring_quat: fixed 4D quaternion for version 1
bar_pos: 3D position
bar_quat: fixed 4D quaternion for version 1

Behavior:

ring block: randomized only in a left-side planar sampling region
bar block: randomized only in a right-side planar sampling region
both objects start flat on the table
both objects use fixed orientation at reset
no random yaw, tilt, or flip in this version

The sampling regions should be chosen conservatively so that:

the left arm can comfortably reach and grasp the ring
the right arm can comfortably reach and grasp the bar
scripted open-loop pick trajectories remain feasible

Control / Action Interface

Reuse the current 16D EE-space action convention already used by the dual-Diana position-control environment:

left arm EE pose: 7D (xyz + quat)
right arm EE pose: 7D (xyz + quat)
left gripper command: 1D
right gripper command: 1D

The new task should continue using EE targets transformed through the existing IK-based control path.

Environment Structure

Implement this as a new task-specific environment path while reusing the existing dual-Diana simulation base where possible.

Expected responsibilities:

scene instantiation for the ring+bar setup
task reset for randomized object placement
environment-state accessors for both objects
reward computation
in-air insertion success detection

The environment factory must dispatch by task name and leave the sim_transfer branch unchanged.

Observation / Environment State

The task should retain the current observation structure style used by the dual-Diana environment:

qpos
multi-camera images

For task state access, the environment should expose a stable env_state vector with this exact order:

ring_pos[0:3]
ring_quat[3:7]
bar_pos[7:10]
bar_quat[10:14]

This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above.

Reward Design

Use staged rewards in the same spirit as the current task, returning the highest achieved stage rather than accumulating one-time sparse bonuses per event.

Maximum reward:

max_reward = 5

Reward stages:

left gripper touches the ring block
right gripper touches the bar block
ring block is lifted off the table
bar block is lifted off the table
while both objects are off the table, the bar truly passes through the ring aperture

Notes:

contact rewards are intended as grasp-progress stages
lift rewards require the object to be off the table, not merely touched
final success reward only applies when both objects are airborne

Success Detection

Success must not be based on a centerline-only check.

A centerline-only test is insufficient because:

the bar has thickness, so a centerline can pass through while the body cannot
a square bar with imperfect orientation can have its centerline inside the aperture while its corners still collide with the ring

Required Success Semantics

A successful insertion requires all of the following:

the ring is off the table
the bar is off the table
the bar has actually crossed through the ring thickness direction
the bar’s finite square cross-section fits through the square aperture during that crossing

Recommended Detection Approach

Use a task-level geometric check in Python rather than relying on contact alone.

Implementation intent:

transform the bar geometry into the ring’s local frame
reason about the bar as a finite oriented box (not a line)
verify that the bar has crossed the ring thickness direction
verify that the portion of the bar passing the aperture fits within the inner square opening, accounting for the bar’s cross-section and orientation

This geometric check is the primary success test.

Role of Contacts

Contacts may still be used for:

grasp-stage rewards
debugging / diagnostics

But contact alone should not be the sole criterion for insertion success, since:

a true clean insertion may have limited aperture-wall contact
persistent contact can also happen while the bar is jammed and not actually inserted

Scripted Policy

Add a new task-specific scripted policy for sim_air_insert_ring_bar.

Policy Intent

The first version prioritizes a conservative, reliable open-loop demonstration rather than an optimized trajectory.

Action Phases

The scripted policy should follow these phases:

move both arms to safe initial / waiting poses with grippers open
move left arm above the ring and right arm above the bar
descend and grasp the assigned objects
lift both objects clear of the table
move both objects to an airborne meeting region above the table
hold the ring stably while aligning the bar with the aperture
push the bar along the intended insertion direction until the geometric success condition is met

Grasp Assignment

left arm: ring only
right arm: bar only

Motion Style

Keep the current repository style:

waypoint-based trajectory definition
open-loop interpolation between waypoints
fixed grasp orientation in the first version

No adaptive replanning is required for the first version.

Files / Integration Scope

The implementation is expected to add task-specific files rather than broadly refactoring the codebase.

Likely additions / changes:

a new MuJoCo scene XML for the ring+bar task
one or more XML fragments defining the two new objects
a new task-specific dual-Diana environment file
robot asset wiring for the new scene XML
reset sampling helpers for the new task
task registration in constants / environment factory paths
a new scripted policy file
focused tests for task creation, reset, rewards, success detection, and scripted policy shape/smoke behavior

Testing Requirements

At minimum, add regression coverage for:

Environment Creation

the new task can be created via the task factory
the existing sim_transfer task remains unchanged

Reset / Sampling

ring reset positions are inside the left sampling region
bar reset positions are inside the right sampling region
reset orientation is fixed as intended

Environment State

environment-state access returns both object poses in the expected structure

Success Detection

Must include both positive and negative cases.

Positive case:

a configuration where the finite bar truly passes through the ring aperture is detected as success

Negative cases:

centerline-inside but finite body would clip the aperture
not enough depth / not actually crossing the ring thickness direction
one or both objects still on the table

Reward Logic

left contact stage
right contact stage
ring lift stage
bar lift stage
final success stage with max_reward = 5

Scripted Policy

At minimum:

policy emits valid 16D actions
trajectory generation does not error
rollout smoke path can step through the new environment

Risks / Constraints

MuJoCo contact naming must remain stable enough for stage rewards
geometric insertion checks must be strict enough to avoid false positives but not so brittle that numerically valid insertions are missed
scripted open-loop insertion may require conservative alignment and lift heights to keep the first version reliable

Acceptance Criteria

The feature is complete when all of the following are true:

sim_air_insert_ring_bar is creatable as an independent task
the scene contains the dual Diana, table, ring block, and bar block
reset randomizes ring and bar positions in left/right planar regions with fixed orientation
the environment exposes task state for both objects
staged rewards progress to max_reward = 5
final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation
existing sim_transfer behavior is preserved

10 KiB Raw Permalink Blame History Unescape Escape

sim_air_insert_ring_bar Design

Summary

Goals

Non-Goals

Task Name

Scene Geometry

Shared Base Scene

Ring Block

Bar Block

Initial Placement / Reset

Control / Action Interface

Environment Structure

Observation / Environment State

Reward Design

Success Detection

Required Success Semantics

Recommended Detection Approach

Role of Contacts

Scripted Policy

Policy Intent

Action Phases

Grasp Assignment

Motion Style

Files / Integration Scope

Testing Requirements

Environment Creation

Reset / Sampling

Environment State

Success Detection

Reward Logic

Scripted Policy

Risks / Constraints

Acceptance Criteria

10 KiB

Raw Permalink Blame History