From 609cf377cbf5c75b375ee11a1502ba9fc5ef6acf Mon Sep 17 00:00:00 2001 From: wangshuai6 Date: Fri, 11 Apr 2025 13:24:51 +0800 Subject: [PATCH] fix bugs --- README.md | 19 ++++++++++++++----- requirements.txt | 2 ++ 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 7903342..82bbeaa 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,12 @@ We decouple diffusion transformer into encoder-decoder design, and surpresingly ## Visualizations ![](./figs/teaser.png) ## Checkpoints -Waiting for release. +We take the off-shelf [VAE](https://huggingface.co/stabilityai/sd-vae-ft-ema) to encode image into latent space, and train the decoder with DDT. +| Dataset | Model | Params | FID | HuggingFace | +|-------------|-------------------|-----------|------|----------------------------------------------------------| +| ImageNet256 | DDT-XL/2(22en6de) | 675M | 1.26 | [🤗](https://huggingface.co/MCG-NJU/DDT-XL-22en6de-R256) | +| ImageNet512 | DDT-XL/2(22en6de) | 675M | 1.28 | [🤗](https://huggingface.co/MCG-NJU/DDT-XL-22en6de-R512) | ## Online Demos Coming soon. @@ -30,16 +34,21 @@ We use ADM evaluation suite to report FID. # for installation pip install -r requirements.txt ``` +```bash +# for inference +python main.py predict -c configs/repa_improved_ddt_xlen22de6_256.yaml --ckpt_path=XXX.ckpt +``` +```bash +# extract image latent (optional) +python3 tools/cache_imlatent4.py +``` ```bash # for training python main.py fit -c configs/repa_improved_ddt_xlen22de6_256.yaml ``` -```bash -# for inference -python main.py predict -c configs/repa_improved_ddt_xlen22de6_256.yaml --ckpt_path=XXX.ckpt -``` + ## Reference ```bibtex @ARTICLE{ddt, diff --git a/requirements.txt b/requirements.txt index 9061a84..7539e55 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,5 @@ lightning==2.5.0.post0 omegaconf==2.3.0 +torch==2.5.0 +diffusers==0.30.0 jsonargparse[signatures]>=4.27.7 \ No newline at end of file