diff --git a/docs/superpowers/plans/2026-04-23-sim-air-insert-ring-bar.md b/docs/superpowers/plans/2026-04-23-sim-air-insert-ring-bar.md index 02b5e6d..f3925e6 100644 --- a/docs/superpowers/plans/2026-04-23-sim-air-insert-ring-bar.md +++ b/docs/superpowers/plans/2026-04-23-sim-air-insert-ring-bar.md @@ -110,9 +110,9 @@ Run: - [ ] **Step 1: Write failing tests for object reset helpers and scene-specific joint naming assumptions** In `tests/test_air_insert_env.py`, add unit tests for helper functions that: -- write ring pose to `ring_block_joint` -- write bar pose to `bar_block_joint` -- read back task state in a stable structure +- write ring pose to `ring_block_joint` from the named task-state mapping +- write bar pose to `bar_block_joint` from the named task-state mapping +- read back `env_state` as a stable 14D vector `[ring_pos, ring_quat, bar_pos, bar_quat]` Use fake `mj_data` objects so tests stay fast and deterministic. @@ -209,8 +209,9 @@ Run: Add tests covering: - the new policy produces a 16D action -- trajectory generation accepts sampled ring/bar state without error +- trajectory generation accepts sampled named task state without error - the first action is a valid open-gripper safe pose command +- a deterministic nominal smoke path (with canonical sampled state or fake env shim) reaches the intended terminal interface contract without shape/reward mismatches Keep the tests unit-level; do not require a full MuJoCo rollout for every assertion. diff --git a/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md b/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md index 52d6cda..feb54b6 100644 --- a/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md +++ b/docs/superpowers/specs/2026-04-23-sim-air-insert-ring-bar-design.md @@ -67,7 +67,16 @@ The bar should also be a single free-joint body. ## Initial Placement / Reset -The first version uses position-only randomization with fixed orientation. +The first version uses position-only randomization with fixed orientation. Reset sampling stays **caller-driven**, matching the existing `sim_transfer` usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into `env.reset(...)`. + +Use an explicit sampled task-state structure with named fields: + +- `ring_pos`: 3D position +- `ring_quat`: fixed 4D quaternion for version 1 +- `bar_pos`: 3D position +- `bar_quat`: fixed 4D quaternion for version 1 + +Behavior: - ring block: randomized only in a left-side planar sampling region - bar block: randomized only in a right-side planar sampling region @@ -113,14 +122,14 @@ The task should retain the current observation structure style used by the dual- - `qpos` - multi-camera images -For task state access, the environment should expose at least the pose information needed to reason about both objects: +For task state access, the environment should expose a stable `env_state` vector with this exact order: -- ring position -- ring orientation if needed for insertion checks / debugging -- bar position -- bar orientation if needed for insertion checks / debugging +- `ring_pos[0:3]` +- `ring_quat[3:7]` +- `bar_pos[7:10]` +- `bar_quat[10:14]` -This state should be sufficient for scripted-policy debugging and future rollout analysis. +This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above. ## Reward Design @@ -303,4 +312,5 @@ The feature is complete when all of the following are true: - staged rewards progress to `max_reward = 5` - final success is based on finite-geometry insertion semantics, not a centerline-only shortcut - a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment +- a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation - existing `sim_transfer` behavior is preserved