docs(spec): add sim air insert ring bar design
This commit is contained in:
@@ -0,0 +1,306 @@
|
|||||||
|
# sim_air_insert_ring_bar Design
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Add a new independent MuJoCo simulation task named `sim_air_insert_ring_bar` that keeps the existing dual-Diana tabletop setup but replaces the single transfer box with two randomized objects:
|
||||||
|
|
||||||
|
- a square ring block grasped by the left arm
|
||||||
|
- a square bar block grasped by the right arm
|
||||||
|
|
||||||
|
The task is to pick both objects off the table and complete an in-air insertion where the bar truly passes through the ring aperture. The existing `sim_transfer` task must remain unchanged.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
- Reuse the current dual-Diana EE-control simulation stack
|
||||||
|
- Keep the same table/base robot arrangement as the existing transfer task
|
||||||
|
- Add an independent task entrypoint and scene definition
|
||||||
|
- Randomize planar placement of both objects within left/right task-specific regions
|
||||||
|
- Implement reward staging for contact, lift, and successful in-air insertion
|
||||||
|
- Add a scripted policy that performs pick, lift, align, and in-air insertion
|
||||||
|
- Preserve compatibility with existing environment creation, evaluation, and rollout patterns
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- No random yaw in the first version
|
||||||
|
- No visual servoing or closed-loop insertion controller
|
||||||
|
- No general multi-task environment framework refactor
|
||||||
|
- No guarantee that the VLA training stack is immediately tuned for this new task
|
||||||
|
- No replacement or behavior change for `sim_transfer`
|
||||||
|
|
||||||
|
## Task Name
|
||||||
|
|
||||||
|
Use a new task name:
|
||||||
|
|
||||||
|
- `sim_air_insert_ring_bar`
|
||||||
|
|
||||||
|
This task should be exposed alongside `sim_transfer`, not as a replacement.
|
||||||
|
|
||||||
|
## Scene Geometry
|
||||||
|
|
||||||
|
### Shared Base Scene
|
||||||
|
|
||||||
|
Keep the dual Diana robot, the table, and the existing camera layout conceptually unchanged.
|
||||||
|
|
||||||
|
### Ring Block
|
||||||
|
|
||||||
|
Represent the square ring as a rigid free body composed from simple MuJoCo box geoms rather than an external mesh.
|
||||||
|
|
||||||
|
Dimensions:
|
||||||
|
|
||||||
|
- outer side length: 68 mm
|
||||||
|
- inner aperture side length: 32 mm
|
||||||
|
- thickness: 18 mm
|
||||||
|
- ring wall width: 18 mm
|
||||||
|
|
||||||
|
The ring should behave as a single object body with a single free joint.
|
||||||
|
|
||||||
|
### Bar Block
|
||||||
|
|
||||||
|
Represent the bar as a rigid free body with a single box geom.
|
||||||
|
|
||||||
|
Dimensions:
|
||||||
|
|
||||||
|
- length: 90 mm
|
||||||
|
- cross-section: 18 mm x 18 mm
|
||||||
|
|
||||||
|
The bar should also be a single free-joint body.
|
||||||
|
|
||||||
|
## Initial Placement / Reset
|
||||||
|
|
||||||
|
The first version uses position-only randomization with fixed orientation.
|
||||||
|
|
||||||
|
- ring block: randomized only in a left-side planar sampling region
|
||||||
|
- bar block: randomized only in a right-side planar sampling region
|
||||||
|
- both objects start flat on the table
|
||||||
|
- both objects use fixed orientation at reset
|
||||||
|
- no random yaw, tilt, or flip in this version
|
||||||
|
|
||||||
|
The sampling regions should be chosen conservatively so that:
|
||||||
|
|
||||||
|
- the left arm can comfortably reach and grasp the ring
|
||||||
|
- the right arm can comfortably reach and grasp the bar
|
||||||
|
- scripted open-loop pick trajectories remain feasible
|
||||||
|
|
||||||
|
## Control / Action Interface
|
||||||
|
|
||||||
|
Reuse the current 16D EE-space action convention already used by the dual-Diana position-control environment:
|
||||||
|
|
||||||
|
- left arm EE pose: 7D (`xyz + quat`)
|
||||||
|
- right arm EE pose: 7D (`xyz + quat`)
|
||||||
|
- left gripper command: 1D
|
||||||
|
- right gripper command: 1D
|
||||||
|
|
||||||
|
The new task should continue using EE targets transformed through the existing IK-based control path.
|
||||||
|
|
||||||
|
## Environment Structure
|
||||||
|
|
||||||
|
Implement this as a new task-specific environment path while reusing the existing dual-Diana simulation base where possible.
|
||||||
|
|
||||||
|
Expected responsibilities:
|
||||||
|
|
||||||
|
- scene instantiation for the ring+bar setup
|
||||||
|
- task reset for randomized object placement
|
||||||
|
- environment-state accessors for both objects
|
||||||
|
- reward computation
|
||||||
|
- in-air insertion success detection
|
||||||
|
|
||||||
|
The environment factory must dispatch by task name and leave the `sim_transfer` branch unchanged.
|
||||||
|
|
||||||
|
## Observation / Environment State
|
||||||
|
|
||||||
|
The task should retain the current observation structure style used by the dual-Diana environment:
|
||||||
|
|
||||||
|
- `qpos`
|
||||||
|
- multi-camera images
|
||||||
|
|
||||||
|
For task state access, the environment should expose at least the pose information needed to reason about both objects:
|
||||||
|
|
||||||
|
- ring position
|
||||||
|
- ring orientation if needed for insertion checks / debugging
|
||||||
|
- bar position
|
||||||
|
- bar orientation if needed for insertion checks / debugging
|
||||||
|
|
||||||
|
This state should be sufficient for scripted-policy debugging and future rollout analysis.
|
||||||
|
|
||||||
|
## Reward Design
|
||||||
|
|
||||||
|
Use staged rewards in the same spirit as the current task, returning the highest achieved stage rather than accumulating one-time sparse bonuses per event.
|
||||||
|
|
||||||
|
Maximum reward:
|
||||||
|
|
||||||
|
- `max_reward = 5`
|
||||||
|
|
||||||
|
Reward stages:
|
||||||
|
|
||||||
|
1. left gripper touches the ring block
|
||||||
|
2. right gripper touches the bar block
|
||||||
|
3. ring block is lifted off the table
|
||||||
|
4. bar block is lifted off the table
|
||||||
|
5. while both objects are off the table, the bar truly passes through the ring aperture
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
- contact rewards are intended as grasp-progress stages
|
||||||
|
- lift rewards require the object to be off the table, not merely touched
|
||||||
|
- final success reward only applies when both objects are airborne
|
||||||
|
|
||||||
|
## Success Detection
|
||||||
|
|
||||||
|
Success must **not** be based on a centerline-only check.
|
||||||
|
|
||||||
|
A centerline-only test is insufficient because:
|
||||||
|
|
||||||
|
- the bar has thickness, so a centerline can pass through while the body cannot
|
||||||
|
- a square bar with imperfect orientation can have its centerline inside the aperture while its corners still collide with the ring
|
||||||
|
|
||||||
|
### Required Success Semantics
|
||||||
|
|
||||||
|
A successful insertion requires all of the following:
|
||||||
|
|
||||||
|
1. the ring is off the table
|
||||||
|
2. the bar is off the table
|
||||||
|
3. the bar has actually crossed through the ring thickness direction
|
||||||
|
4. the bar’s finite square cross-section fits through the square aperture during that crossing
|
||||||
|
|
||||||
|
### Recommended Detection Approach
|
||||||
|
|
||||||
|
Use a task-level geometric check in Python rather than relying on contact alone.
|
||||||
|
|
||||||
|
Implementation intent:
|
||||||
|
|
||||||
|
- transform the bar geometry into the ring’s local frame
|
||||||
|
- reason about the bar as a finite oriented box (not a line)
|
||||||
|
- verify that the bar has crossed the ring thickness direction
|
||||||
|
- verify that the portion of the bar passing the aperture fits within the inner square opening, accounting for the bar’s cross-section and orientation
|
||||||
|
|
||||||
|
This geometric check is the primary success test.
|
||||||
|
|
||||||
|
### Role of Contacts
|
||||||
|
|
||||||
|
Contacts may still be used for:
|
||||||
|
|
||||||
|
- grasp-stage rewards
|
||||||
|
- debugging / diagnostics
|
||||||
|
|
||||||
|
But contact alone should **not** be the sole criterion for insertion success, since:
|
||||||
|
|
||||||
|
- a true clean insertion may have limited aperture-wall contact
|
||||||
|
- persistent contact can also happen while the bar is jammed and not actually inserted
|
||||||
|
|
||||||
|
## Scripted Policy
|
||||||
|
|
||||||
|
Add a new task-specific scripted policy for `sim_air_insert_ring_bar`.
|
||||||
|
|
||||||
|
### Policy Intent
|
||||||
|
|
||||||
|
The first version prioritizes a conservative, reliable open-loop demonstration rather than an optimized trajectory.
|
||||||
|
|
||||||
|
### Action Phases
|
||||||
|
|
||||||
|
The scripted policy should follow these phases:
|
||||||
|
|
||||||
|
1. move both arms to safe initial / waiting poses with grippers open
|
||||||
|
2. move left arm above the ring and right arm above the bar
|
||||||
|
3. descend and grasp the assigned objects
|
||||||
|
4. lift both objects clear of the table
|
||||||
|
5. move both objects to an airborne meeting region above the table
|
||||||
|
6. hold the ring stably while aligning the bar with the aperture
|
||||||
|
7. push the bar along the intended insertion direction until the geometric success condition is met
|
||||||
|
|
||||||
|
### Grasp Assignment
|
||||||
|
|
||||||
|
- left arm: ring only
|
||||||
|
- right arm: bar only
|
||||||
|
|
||||||
|
### Motion Style
|
||||||
|
|
||||||
|
Keep the current repository style:
|
||||||
|
|
||||||
|
- waypoint-based trajectory definition
|
||||||
|
- open-loop interpolation between waypoints
|
||||||
|
- fixed grasp orientation in the first version
|
||||||
|
|
||||||
|
No adaptive replanning is required for the first version.
|
||||||
|
|
||||||
|
## Files / Integration Scope
|
||||||
|
|
||||||
|
The implementation is expected to add task-specific files rather than broadly refactoring the codebase.
|
||||||
|
|
||||||
|
Likely additions / changes:
|
||||||
|
|
||||||
|
- a new MuJoCo scene XML for the ring+bar task
|
||||||
|
- one or more XML fragments defining the two new objects
|
||||||
|
- a new task-specific dual-Diana environment file
|
||||||
|
- robot asset wiring for the new scene XML
|
||||||
|
- reset sampling helpers for the new task
|
||||||
|
- task registration in constants / environment factory paths
|
||||||
|
- a new scripted policy file
|
||||||
|
- focused tests for task creation, reset, rewards, success detection, and scripted policy shape/smoke behavior
|
||||||
|
|
||||||
|
## Testing Requirements
|
||||||
|
|
||||||
|
At minimum, add regression coverage for:
|
||||||
|
|
||||||
|
### Environment Creation
|
||||||
|
|
||||||
|
- the new task can be created via the task factory
|
||||||
|
- the existing `sim_transfer` task remains unchanged
|
||||||
|
|
||||||
|
### Reset / Sampling
|
||||||
|
|
||||||
|
- ring reset positions are inside the left sampling region
|
||||||
|
- bar reset positions are inside the right sampling region
|
||||||
|
- reset orientation is fixed as intended
|
||||||
|
|
||||||
|
### Environment State
|
||||||
|
|
||||||
|
- environment-state access returns both object poses in the expected structure
|
||||||
|
|
||||||
|
### Success Detection
|
||||||
|
|
||||||
|
Must include both positive and negative cases.
|
||||||
|
|
||||||
|
Positive case:
|
||||||
|
|
||||||
|
- a configuration where the finite bar truly passes through the ring aperture is detected as success
|
||||||
|
|
||||||
|
Negative cases:
|
||||||
|
|
||||||
|
- centerline-inside but finite body would clip the aperture
|
||||||
|
- not enough depth / not actually crossing the ring thickness direction
|
||||||
|
- one or both objects still on the table
|
||||||
|
|
||||||
|
### Reward Logic
|
||||||
|
|
||||||
|
- left contact stage
|
||||||
|
- right contact stage
|
||||||
|
- ring lift stage
|
||||||
|
- bar lift stage
|
||||||
|
- final success stage with `max_reward = 5`
|
||||||
|
|
||||||
|
### Scripted Policy
|
||||||
|
|
||||||
|
At minimum:
|
||||||
|
|
||||||
|
- policy emits valid 16D actions
|
||||||
|
- trajectory generation does not error
|
||||||
|
- rollout smoke path can step through the new environment
|
||||||
|
|
||||||
|
## Risks / Constraints
|
||||||
|
|
||||||
|
- MuJoCo contact naming must remain stable enough for stage rewards
|
||||||
|
- geometric insertion checks must be strict enough to avoid false positives but not so brittle that numerically valid insertions are missed
|
||||||
|
- scripted open-loop insertion may require conservative alignment and lift heights to keep the first version reliable
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
The feature is complete when all of the following are true:
|
||||||
|
|
||||||
|
- `sim_air_insert_ring_bar` is creatable as an independent task
|
||||||
|
- the scene contains the dual Diana, table, ring block, and bar block
|
||||||
|
- reset randomizes ring and bar positions in left/right planar regions with fixed orientation
|
||||||
|
- the environment exposes task state for both objects
|
||||||
|
- staged rewards progress to `max_reward = 5`
|
||||||
|
- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
|
||||||
|
- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
|
||||||
|
- existing `sim_transfer` behavior is preserved
|
||||||
Reference in New Issue
Block a user