docs: clarify ring bar task state contracts
This commit is contained in:
@@ -67,7 +67,16 @@ The bar should also be a single free-joint body.
|
||||
|
||||
## Initial Placement / Reset
|
||||
|
||||
The first version uses position-only randomization with fixed orientation.
|
||||
The first version uses position-only randomization with fixed orientation. Reset sampling stays **caller-driven**, matching the existing `sim_transfer` usage pattern in rollout/eval code: a helper samples task state, then callers pass that state into `env.reset(...)`.
|
||||
|
||||
Use an explicit sampled task-state structure with named fields:
|
||||
|
||||
- `ring_pos`: 3D position
|
||||
- `ring_quat`: fixed 4D quaternion for version 1
|
||||
- `bar_pos`: 3D position
|
||||
- `bar_quat`: fixed 4D quaternion for version 1
|
||||
|
||||
Behavior:
|
||||
|
||||
- ring block: randomized only in a left-side planar sampling region
|
||||
- bar block: randomized only in a right-side planar sampling region
|
||||
@@ -113,14 +122,14 @@ The task should retain the current observation structure style used by the dual-
|
||||
- `qpos`
|
||||
- multi-camera images
|
||||
|
||||
For task state access, the environment should expose at least the pose information needed to reason about both objects:
|
||||
For task state access, the environment should expose a stable `env_state` vector with this exact order:
|
||||
|
||||
- ring position
|
||||
- ring orientation if needed for insertion checks / debugging
|
||||
- bar position
|
||||
- bar orientation if needed for insertion checks / debugging
|
||||
- `ring_pos[0:3]`
|
||||
- `ring_quat[3:7]`
|
||||
- `bar_pos[7:10]`
|
||||
- `bar_quat[10:14]`
|
||||
|
||||
This state should be sufficient for scripted-policy debugging and future rollout analysis.
|
||||
This 14D state should be sufficient for scripted-policy debugging and future rollout analysis, while reset itself remains caller-driven via the named task-state helper structure above.
|
||||
|
||||
## Reward Design
|
||||
|
||||
@@ -303,4 +312,5 @@ The feature is complete when all of the following are true:
|
||||
- staged rewards progress to `max_reward = 5`
|
||||
- final success is based on finite-geometry insertion semantics, not a centerline-only shortcut
|
||||
- a new scripted policy can execute the intended pick-lift-align-insert behavior in the new environment
|
||||
- a canonical nominal smoke path (unit-level or deterministic integration-level) exists for the new scripted-policy interface so success is not judged purely by interpretation
|
||||
- existing `sim_transfer` behavior is preserved
|
||||
|
||||
Reference in New Issue
Block a user