docs: clarify ring bar task state contracts

2026-04-23 16:47:05 +08:00
parent 4ea75966ee
commit 636290d36a
2 changed files with 22 additions and 11 deletions
@@ -67,7 +67,16 @@ The bar should also be a single free-joint body.

 ## Initial Placement / Reset

-The first version uses position-only randomization with fixed orientation.
+The first version uses position-only randomization with fixed orientation. Reset sampling stays **caller-driven**, matching the existing `sim_transfer` usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into `env.reset(...)`.
+
+Use an explicit sampled task-state structure with named fields:
+
+- `ring_pos`: 3D position
+- `ring_quat`: fixed 4D quaternion for version 1
+- `bar_pos`: 3D position
+- `bar_quat`: fixed 4D quaternion for version 1
+
+Behavior:

 - ring block: randomized only in a left-side planar sampling region
 - bar block: randomized only in a right-side planar sampling region
@@ -113,14 +122,14 @@ The task should retain the current observation structure style used by the dual-
 - `qpos`
 - multi-camera images

-For task state access, the environment should expose at least the pose information needed to reason about both objects:
+For task state access, the environment should expose a stable `env_state` vector with this exact order:

- ring position
- ring orientation if needed for insertion checks / debugging
- bar position
- bar orientation if needed for insertion checks / debugging
+- `ring_pos[0:3]`
+- `ring_quat[3:7]`
+- `bar_pos[7:10]`
+- `bar_quat[10:14]`

-This state should be sufficient for scripted-policy debugging and future rollout analysis.
+This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above.

 ## Reward Design

@@ -303,4 +312,5 @@ The feature is complete when all of the following are true:
 - staged rewards progress to `max_reward = 5`
 - final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
 - a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
+- a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation
 - existing `sim_transfer` behavior is preserved