2.9 KiB
2.9 KiB
Agent Notes
Purpose
~/diffusion_policy is the Diffusion Policy training repo. The main workflow here is Hydra-driven training via train.py, with the canonical PushT image experiment configured by image_pusht_diffusion_policy_cnn.yaml.
Top Level
diffusion_policy/: core code, configs, datasets, env runners, workspaces.data/: local datasets, outputs, checkpoints, run logs.train.py: main training entrypoint.eval.py: checkpoint evaluation entrypoint.image_pusht_diffusion_policy_cnn.yaml: canonical single-seed PushT image config from the README path..venv/: localuv-managed virtualenv..uv-cache/,.uv-python/: localuvcache and Python install state.README.md: upstream instructions and canonical commands.
Canonical PushT Image Path
- Entrypoint:
python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml - Dataset path in config:
data/pusht/pusht_cchi_v7_replay.zarr - README canonical device override:
training.device=cuda:0
Data
- PushT archive currently present at
data/pusht.zip - Unpacked dataset used by training:
data/pusht/pusht_cchi_v7_replay.zarr
Local Compatibility Adjustments
diffusion_policy/env_runner/pusht_image_runner.pynow usesSyncVectorEnvinstead ofAsyncVectorEnv. Reason: avoid shared-memory and semaphore failures on this host/session.diffusion_policy/gym_util/sync_vector_env.pyhas local compatibility changes:- added
reset_async - seeded
reset_wait - updated
concatenate(...)call order for the currentgymAPI
- added
Environment Expectations
- Use the local
uvenv at.venv - Verified local Python:
3.9.25 - Verified local Torch stack:
torch 2.8.0+cu128,torchvision 0.23.0+cu128 - Other key installed versions verified in
.venv:gym 0.23.1hydra-core 1.2.0diffusers 0.11.1huggingface_hub 0.10.1wandb 0.13.3zarr 2.12.0numcodecs 0.10.2av 14.0.1
- Important note: this shell currently reports
torch.cuda.is_available() == False, so always verify CUDA access in the current session before assuming GPU is usable.
Logging And Outputs
- Hydra run outputs:
data/outputs/... - Per-run files to check first:
.hydra/overrides.yamllogs.json.txttrain.logcheckpoints/latest.ckpt
- Extra launcher logs may live under
data/run_logs/
Practical Guidance
- Inspect with
rg,sed, and existing Hydra output folders before changing code. - Prefer config overrides before code edits.
- On this host, start from these safety overrides unless revalidated:
logging.mode=offlinedataloader.num_workers=0val_dataloader.num_workers=0task.env_runner.n_envs=1task.env_runner.n_test_vis=0task.env_runner.n_train_vis=0
- If a run fails, inspect
.hydra/overrides.yaml, thenlogs.json.txt, thentrain.log. - Avoid driver or system changes unless the repo-local path is clearly blocked.