# Agent Notes ## Purpose `~/diffusion_policy` is the Diffusion Policy training repo. The main workflow here is Hydra-driven training via `train.py`, with the canonical PushT image experiment configured by `image_pusht_diffusion_policy_cnn.yaml`. ## Top Level - `diffusion_policy/`: core code, configs, datasets, env runners, workspaces. - `data/`: local datasets, outputs, checkpoints, run logs. - `train.py`: main training entrypoint. - `eval.py`: checkpoint evaluation entrypoint. - `image_pusht_diffusion_policy_cnn.yaml`: canonical single-seed PushT image config from the README path. - `.venv/`: local `uv`-managed virtualenv. - `.uv-cache/`, `.uv-python/`: local `uv` cache and Python install state. - `README.md`: upstream instructions and canonical commands. ## Canonical PushT Image Path - Entrypoint: `python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml` - Dataset path in config: `data/pusht/pusht_cchi_v7_replay.zarr` - README canonical device override: `training.device=cuda:0` ## Data - PushT archive currently present at `data/pusht.zip` - Unpacked dataset used by training: `data/pusht/pusht_cchi_v7_replay.zarr` ## Local Compatibility Adjustments - `diffusion_policy/env_runner/pusht_image_runner.py` now uses `SyncVectorEnv` instead of `AsyncVectorEnv`. Reason: avoid shared-memory and semaphore failures on this host/session. - `diffusion_policy/gym_util/sync_vector_env.py` has local compatibility changes: - added `reset_async` - seeded `reset_wait` - updated `concatenate(...)` call order for the current `gym` API ## Environment Expectations - Use the local `uv` env at `.venv` - Verified local Python: `3.9.25` - Verified local Torch stack: `torch 2.8.0+cu128`, `torchvision 0.23.0+cu128` - Other key installed versions verified in `.venv`: - `gym 0.23.1` - `hydra-core 1.2.0` - `diffusers 0.11.1` - `huggingface_hub 0.10.1` - `wandb 0.13.3` - `zarr 2.12.0` - `numcodecs 0.10.2` - `av 14.0.1` - Important note: this shell currently reports `torch.cuda.is_available() == False`, so always verify CUDA access in the current session before assuming GPU is usable. ## Logging And Outputs - Hydra run outputs: `data/outputs/...` - Per-run files to check first: - `.hydra/overrides.yaml` - `logs.json.txt` - `train.log` - `checkpoints/latest.ckpt` - Extra launcher logs may live under `data/run_logs/` ## Practical Guidance - Inspect with `rg`, `sed`, and existing Hydra output folders before changing code. - Prefer config overrides before code edits. - On this host, start from these safety overrides unless revalidated: - `logging.mode=offline` - `dataloader.num_workers=0` - `val_dataloader.num_workers=0` - `task.env_runner.n_envs=1` - `task.env_runner.n_test_vis=0` - `task.env_runner.n_train_vis=0` - If a run fails, inspect `.hydra/overrides.yaml`, then `logs.json.txt`, then `train.log`. - Avoid driver or system changes unless the repo-local path is clearly blocked.