docs: clarify ring bar task state contracts
This commit is contained in:
@@ -110,9 +110,9 @@ Run:
|
|||||||
- [ ] **Step 1: Write failing tests for object reset helpers and scene-specific joint naming assumptions**
|
- [ ] **Step 1: Write failing tests for object reset helpers and scene-specific joint naming assumptions**
|
||||||
|
|
||||||
In `tests/test_air_insert_env.py`, add unit tests for helper functions that:
|
In `tests/test_air_insert_env.py`, add unit tests for helper functions that:
|
||||||
- write ring pose to `ring_block_joint`
|
- write ring pose to `ring_block_joint` from the named task-state mapping
|
||||||
- write bar pose to `bar_block_joint`
|
- write bar pose to `bar_block_joint` from the named task-state mapping
|
||||||
- read back task state in a stable structure
|
- read back `env_state` as a stable 14D vector `[ring_pos, ring_quat, bar_pos, bar_quat]`
|
||||||
|
|
||||||
Use fake `mj_data` objects so tests stay fast and deterministic.
|
Use fake `mj_data` objects so tests stay fast and deterministic.
|
||||||
|
|
||||||
@@ -209,8 +209,9 @@ Run:
|
|||||||
|
|
||||||
Add tests covering:
|
Add tests covering:
|
||||||
- the new policy produces a 16D action
|
- the new policy produces a 16D action
|
||||||
- trajectory generation accepts sampled ring/bar state without error
|
- trajectory generation accepts sampled named task state without error
|
||||||
- the first action is a valid open-gripper safe pose command
|
- the first action is a valid open-gripper safe pose command
|
||||||
|
- a deterministic nominal smoke path (with canonical sampled state or fake env shim) reaches the intended terminal interface contract without shape/reward mismatches
|
||||||
|
|
||||||
Keep the tests unit-level; do not require a full MuJoCo rollout for every assertion.
|
Keep the tests unit-level; do not require a full MuJoCo rollout for every assertion.
|
||||||
|
|
||||||
|
|||||||
@@ -67,7 +67,16 @@ The bar should also be a single free-joint body.
|
|||||||
|
|
||||||
## Initial Placement / Reset
|
## Initial Placement / Reset
|
||||||
|
|
||||||
The first version uses position-only randomization with fixed orientation.
|
The first version uses position-only randomization with fixed orientation. Reset sampling stays **caller-driven**, matching the existing `sim_transfer` usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into `env.reset(...)`.
|
||||||
|
|
||||||
|
Use an explicit sampled task-state structure with named fields:
|
||||||
|
|
||||||
|
- `ring_pos`: 3D position
|
||||||
|
- `ring_quat`: fixed 4D quaternion for version 1
|
||||||
|
- `bar_pos`: 3D position
|
||||||
|
- `bar_quat`: fixed 4D quaternion for version 1
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
|
||||||
- ring block: randomized only in a left-side planar sampling region
|
- ring block: randomized only in a left-side planar sampling region
|
||||||
- bar block: randomized only in a right-side planar sampling region
|
- bar block: randomized only in a right-side planar sampling region
|
||||||
@@ -113,14 +122,14 @@ The task should retain the current observation structure style used by the dual-
|
|||||||
- `qpos`
|
- `qpos`
|
||||||
- multi-camera images
|
- multi-camera images
|
||||||
|
|
||||||
For task state access, the environment should expose at least the pose information needed to reason about both objects:
|
For task state access, the environment should expose a stable `env_state` vector with this exact order:
|
||||||
|
|
||||||
- ring position
|
- `ring_pos[0:3]`
|
||||||
- ring orientation if needed for insertion checks / debugging
|
- `ring_quat[3:7]`
|
||||||
- bar position
|
- `bar_pos[7:10]`
|
||||||
- bar orientation if needed for insertion checks / debugging
|
- `bar_quat[10:14]`
|
||||||
|
|
||||||
This state should be sufficient for scripted-policy debugging and future rollout analysis.
|
This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above.
|
||||||
|
|
||||||
## Reward Design
|
## Reward Design
|
||||||
|
|
||||||
@@ -303,4 +312,5 @@ The feature is complete when all of the following are true:
|
|||||||
- staged rewards progress to `max_reward = 5`
|
- staged rewards progress to `max_reward = 5`
|
||||||
- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
|
- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
|
||||||
- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
|
- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
|
||||||
|
- a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation
|
||||||
- existing `sim_transfer` behavior is preserved
|
- existing `sim_transfer` behavior is preserved
|
||||||
|
|||||||
Reference in New Issue
Block a user