Files
diffusion_policy/AGENTS.md

2.9 KiB

Agent Notes

Purpose

~/diffusion_policy is the Diffusion Policy training repo. The main workflow here is Hydra-driven training via train.py, with the canonical PushT image experiment configured by image_pusht_diffusion_policy_cnn.yaml.

Top Level

  • diffusion_policy/: core code, configs, datasets, env runners, workspaces.
  • data/: local datasets, outputs, checkpoints, run logs.
  • train.py: main training entrypoint.
  • eval.py: checkpoint evaluation entrypoint.
  • image_pusht_diffusion_policy_cnn.yaml: canonical single-seed PushT image config from the README path.
  • .venv/: local uv-managed virtualenv.
  • .uv-cache/, .uv-python/: local uv cache and Python install state.
  • README.md: upstream instructions and canonical commands.

Canonical PushT Image Path

  • Entrypoint: python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml
  • Dataset path in config: data/pusht/pusht_cchi_v7_replay.zarr
  • README canonical device override: training.device=cuda:0

Data

  • PushT archive currently present at data/pusht.zip
  • Unpacked dataset used by training: data/pusht/pusht_cchi_v7_replay.zarr

Local Compatibility Adjustments

  • diffusion_policy/env_runner/pusht_image_runner.py now uses SyncVectorEnv instead of AsyncVectorEnv. Reason: avoid shared-memory and semaphore failures on this host/session.
  • diffusion_policy/gym_util/sync_vector_env.py has local compatibility changes:
    • added reset_async
    • seeded reset_wait
    • updated concatenate(...) call order for the current gym API

Environment Expectations

  • Use the local uv env at .venv
  • Verified local Python: 3.9.25
  • Verified local Torch stack: torch 2.8.0+cu128, torchvision 0.23.0+cu128
  • Other key installed versions verified in .venv:
    • gym 0.23.1
    • hydra-core 1.2.0
    • diffusers 0.11.1
    • huggingface_hub 0.10.1
    • wandb 0.13.3
    • zarr 2.12.0
    • numcodecs 0.10.2
    • av 14.0.1
  • Important note: this shell currently reports torch.cuda.is_available() == False, so always verify CUDA access in the current session before assuming GPU is usable.

Logging And Outputs

  • Hydra run outputs: data/outputs/...
  • Per-run files to check first:
    • .hydra/overrides.yaml
    • logs.json.txt
    • train.log
    • checkpoints/latest.ckpt
  • Extra launcher logs may live under data/run_logs/

Practical Guidance

  • Inspect with rg, sed, and existing Hydra output folders before changing code.
  • Prefer config overrides before code edits.
  • On this host, start from these safety overrides unless revalidated:
    • logging.mode=offline
    • dataloader.num_workers=0
    • val_dataloader.num_workers=0
    • task.env_runner.n_envs=1
    • task.env_runner.n_test_vis=0
    • task.env_runner.n_train_vis=0
  • If a run fails, inspect .hydra/overrides.yaml, then logs.json.txt, then train.log.
  • Avoid driver or system changes unless the repo-local path is clearly blocked.