AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang
Learning and Vision Lab, National University of Singapore
🥯[Arxiv]🎄[Project Page]
Code Contributors: Zigeng Chen, Zhenxiong Tan

2.8x Faster on SDXL with 4 devices. Top: 50 step original (13.81s). Bottom: 50 step AsyncDiff (4.98s)

1.8x Faster on AnimateDiff with 2 devices. Top: 50 step original (43.5s). Bottom: 50 step AsyncDiff (24.5s)

Updates

🚀 June 18, 2024: Now supporting ControlNet! The inference sample of accelerating controlnet+SDXL can be found at run_sdxl_controlnet.py.
🚀 June 17, 2024: Now supporting Stable Diffusion x4 Upscaler! The inference sample can be found at run_sd_upscaler.py.
🚀 June 12, 2024: Code of AsyncDiff is released.

Supported Diffusion Models:

Introduction

We introduce AsyncDiff, a universal and plug-and-play diffusion acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality.

Above is the overview of the asynchronous denoising process. The denoising model εθ is divided into four components for clarity. Following the warm-up stage, each component’s input is prepared in advance, breaking the dependency chain and facilitating parallel processing.

🔧 Quick Start

Installation

Prerequisites

NVIDIA GPU + CUDA >= 12.0 and corresponding CuDNN

Create environment：

conda create -n asyncdiff python=3.10
conda activate asyncdiff
pip install -r requirements.txt

Usage Example

Simply add two lines of code to enable asynchronous parallel inference for the diffusion model.

import torch
from diffusers import StableDiffusionPipeline
from asyncdiff.async_sd import AsyncDiff

pipeline = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", 
torch_dtype=torch.float16, use_safetensors=True, low_cpu_mem_usage=True)

async_diff = AsyncDiff(pipeline, model_n=2, stride=1, time_shift=False)

async_diff.reset_state(warm_up=1)
image = pipeline(<prompts>).images[0]
if dist.get_rank() == 0:
  image.save(f"output.jpg")

Here, we use the Stable Diffusion pipeline as an example. You can replace pipeline with any variant of the Stable Diffusion pipeline, such as SD 2.1, SD 1.5, SDXL, or SVD. We also provide the implementation of AsyncDiff for AnimateDiff in asyncdiff.async_animate.

model_n: Number of components into which the denoising model is divided. Options: 2, 3, or 4.
stride: Denoising stride of each parallel computing batch. Options: 1 or 2.
warm_up: Number of steps for the warm-up stage. More warm-up steps can achieve pixel-level consistency with the original output while slightly reducing processing speed.
time_shift: Enables time shifting. Setting time_shift to True can enhance the denoising capability of the diffusion model. However, it should generally remain False. Only enable time_shift when the accelerated model produces images or videos with significant noise.

Inference

We offer detailed scripts in examples/ for accelerating inference of SD 2.1, SD 1.5, SDXL, ControNet, SD_Upscaler, AnimateDiff, and SVD using our AsyncDiff framework.

Accelerate Stable Diffusion XL:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sdxl.py

Accelerate Stable Diffusion 2.1 or 1.5:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 --run-path examples/run_sd.py

Accelerate Stable Diffusion x4 Upscaler:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sd_upscaler.py

Accelerate ControlNet+SDXL :

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_sdxl_controlnet.py

Accelerate Animate Diffusion:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_animatediff.py

Accelerate Stable Video Diffusion:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node=2 --run-path examples/run_svd.py

Qualitative Results

Qualitative Results on SDXL and SD 2.1. More qualitative results can be found in our paper.

Quantitative Results

Quantitative evaluations of AsyncDiff on three text-to-image diffusion models, showcasing various configurations. More quantitative results can be found in our paper.

Bibtex

@misc{chen2024asyncdiff,
      title={AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising}, 
      author={Zigeng Chen and Xinyin Ma and Gongfan Fang and Zhenxiong Tan and Xinchao Wang},
      year={2024},
      eprint={2406.06911},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
assets		assets
asyncdiff		asyncdiff
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Updates

Supported Diffusion Models:

Introduction

🔧 Quick Start

Installation

Usage Example

Inference

Accelerate Stable Diffusion XL:

Accelerate Stable Diffusion 2.1 or 1.5:

Accelerate Stable Diffusion x4 Upscaler:

Accelerate ControlNet+SDXL :

Accelerate Animate Diffusion:

Accelerate Stable Video Diffusion:

Qualitative Results

Quantitative Results

Bibtex

About

Releases

Packages

Contributors 2

Languages

License

czg1225/AsyncDiff

Folders and files

Latest commit

History

Repository files navigation

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Updates

Supported Diffusion Models:

Introduction

🔧 Quick Start

Installation

Usage Example

Inference

Accelerate Stable Diffusion XL:

Accelerate Stable Diffusion 2.1 or 1.5:

Accelerate Stable Diffusion x4 Upscaler:

Accelerate ControlNet+SDXL :

Accelerate Animate Diffusion:

Accelerate Stable Video Diffusion:

Qualitative Results

Quantitative Results

Bibtex

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages