2.3 KiB
Streaming HDF5 EE Action Dataset Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: 将 Diana 仿真采集改为流式写入 HDF5,图像保存为 256x256 的四路相机视角,并把 /action 改为 IK 前的原始末端位姿动作。
Architecture: 新增一个独立的流式 HDF5 episode writer,负责逐帧写入 qpos、原始 action 和 resize 后图像,并在 episode 成功时原子提交、失败时删除临时文件。采集脚本只负责 rollout 和把每一步观测/动作交给 writer,避免整集数据先堆在内存里。
Tech Stack: Python, h5py, numpy, cv2, unittest, MuJoCo demo scripts
Task 1: 为流式 writer 建立测试边界
Files:
-
Create:
tests/test_streaming_episode_writer.py -
Create:
roboimi/utils/streaming_episode_writer.py -
Step 1: Write the failing test
-
Step 2: Run
python -m unittest tests.test_streaming_episode_writer -vand confirm it fails because the writer module does not exist -
Step 3: Implement the minimal streaming writer with temp-file commit/discard, per-frame append, and 256x256 image resize
-
Step 4: Re-run
python -m unittest tests.test_streaming_episode_writer -vand confirm it passes
Task 2: 接入 Diana 采集脚本
Files:
-
Modify:
roboimi/demos/diana_record_sim_episodes.py -
Reuse:
roboimi/utils/streaming_episode_writer.py -
Step 1: Replace in-memory
data_dict/obsaccumulation with per-episode streaming writer lifecycle -
Step 2: Keep four cameras (
angle,r_vis,top,front) and resize to 256x256 before persistence -
Step 3: Capture raw policy output before IK and write that to
/action -
Step 4: On success commit to
episode_{idx}.hdf5; on failure remove temp file
Task 3: 验证改动
Files:
-
Verify only
-
Step 1: Run unit tests for the writer
-
Step 2: Run one end-to-end collection episode and stop after
episode_0.hdf5becomes readable -
Step 3: Verify HDF5 keys and shapes:
action=(700,16), image datasets are(700,256,256,3), and/actionmatches raw EE action semantics