Efficient-Diffusion-Models

📢 Updates

2025.03: We released a survey paper "Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices". Feel free to cite or open pull requests.

👀 Introduction

Welcome to the repository for our survey paper, "Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices". This repository provides resources and updates related to our research. For a detailed introduction, please refer to our survey paper.

The recent timeline of efficient DMs, covering core methods and the release of open-source and closed-source reproduction projects.

This figure outlines the conceptual framework employed in our presentation of efficient diffusion models.

This figure compares the core features of mainstream diffusion-based generative models

This figure outline various adapters and their applications.

📒 Table of Contents

Efficient-Diffusion-Models

Part 1: Introduction

Improving image generation with better captions [Paper]
Plug-and-play diffusion features for text-driven image-to-image translation [Paper]
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis [Paper]
Sine: Single image editing with text-to-image diffusion models [Paper]
Instructpix2pix: Learning to follow image editing instructions[Paper]
Latent video diffusion models for high-fidelity video generation with arbitrary lengths [Paper]
MagicVideo: Efficient Video Generation With Latent Diffusion Models [Paper]
ModelScope Text-to-Video Technical Report [Paper]
Stable video diffusion: Scaling latent video diffusion models to large datasets [Paper]
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation [Paper]
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models [Paper]
StableVideo: Text-driven Consistency-aware Diffusion Video Editing [Paper]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper]
Lumiere: A Space-Time Diffusion Model for Video Generation [Paper]
Text2video-zero: Text-to-image diffusion models are zero-shot video generators [Paper]
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing [Paper]
Dreamix: Video Diffusion Models are General Video Editors [Paper]
ControlVideo: Training-free Controllable Text-to-Video Generation [Paper]
Rerender a video: Zero-shot text-guided video-to-video translation [Paper]
Dreamfusion: Text-to-3d using 2d diffusion [Paper]
Mvdream: Multi-view diffusion for 3d generation [Paper]
Magic3D: High-Resolution Text-to-3D Content Creation [Paper]
Hifa: High-fidelity text-to-3d with advanced diffusion guidance [Paper]
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion [Paper]
DDM2: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models [Paper]
Solving Inverse Problems in Medical Imaging with Score-Based Generative Models [Paper]
Diffwave: A versatile diffusion model for audio synthesis[[Paper]] (https://arxiv.org/abs/2009.09761)
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [Paper]
Diffsound: Discrete Diffusion Model for Text-to-Sound Generation [Paper]
Highly accurate protein structure prediction with AlphaFold [Paper]
Denovo design of protein structure and function with rfdiffusion [Paper]
Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures [Paper]
A dual diffusion model enables 3d molecule generation and lead optimization based on target pockets [Paper]
Diffdock: Diffusion steps, twists, and turns for molecular docking [Paper]
Fast sampling of diffusion models via operator learning [Paper]
Progressive distillation for fast sampling of diffusion models [Paper]
Consistency Models [Paper]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference [Paper]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [Paper]
High-Resolution Image Synthesis With Latent Diffusion Models [Paper]
Structure and content-guided video synthesis with diffusion models [Paper]
Maximum likelihood training of implicit nonlinear diffusion model [Paper]
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data [Paper]
Maximum likelihood training for score-based diffusion odes by high order denoising score matching [Paper]
Generalized deep 3d shape prior via part-discretized diffusion process [Paper]
Vector quantized diffusion model for text-to-image synthesis [Paper]
Understanding Diffusion Models: A Unified Perspective [Paper]
Diffusion models in vision: A survey [Paper]
Diffusion models: A comprehensive survey of methods and applications [Paper]
A Survey on Generative Diffusion Model [Paper]
Emergent abilities of large language models [Paper]
GPT-4 Technical Report[Paper]
Video generation models as world simulators [Online]

Part 2: Efficient Diffusion Models: Foundational Principles

Improved Denoising Diffusion Probabilistic Models [Paper]
Score-Based Generative Modeling through Stochastic Differential Equations [Paper]
Taming Transformers for High-Resolution Image Synthesis[Paper]
Adding Conditional Control to Text-to-Image Diffusion Models [Paper]
Prompt-to-Prompt Image Editing with Cross Attention Control [Paper]
Null-text inversion for editing real images using guided diffusion models [Paper]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation [Paper]
Imagic: Text-Based Real Image Editing with Diffusion Models[Paper]
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [Paper]
Cascaded diffusion models for high fidelity image generation [Paper]
Play-ground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation [Paper]
All are worth words: A vit backbone for diffusion models [Paper]

Part 3: Mainstream Network Architectures

Denoising diffusion implicit models [Paper]
Diffusion models beat gans on image synthesis[Paper]
Photorealistic text-to-image diffusion models with deep language understanding [Paper]
Hierarchical text-conditional image generation with clip latents [Paper]
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers [Paper]
Sdxl: Improving latent diffusion models for high-resolution image synthesis [Paper]
Scaling rectified flow transformers for high-resolution image synthesis [Paper]
Scalable diffusion models with transformers [Paper]
Neural Residual Diffusion Models for Deep Scalable Vision Generation[Paper]
PixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis [Paper]
FiT: Flexible Vision Transformer for Diffusion Model [Paper]
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers [Paper]
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding [Paper]
Kolors: Effective training of diffusion model for photorealistic text-to-image synthesis [Online]
Flux [Online]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [Paper]
Open-Sora: Democratizing Efficient Video Production for All [Paper]
Open-Sora: Democratizing Efficient Video Production for All [Online]
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture [Paper]
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer [Paper]
Movie gen: A cast of media foundation models [Online]
Auto-Encoding Variational Bayes [Paper]
Taming transformers for high-resolution image synthesis [Paper]
Neural Discrete Representation Learning [Paper]
Autoregressive Image Generation Using Residual Quantization [Paper]
Imagen Video: High Definition Video Generation with Diffusion Models [Paper]
One transformer fits all distributions in multi-modal diffusion at scale [Paper]
Lmd: faster image reconstruction with latent masking diffusion [Paper]
Make-A-Video: Text-to-Video Generation without Text-Video Data [Paper]
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation [Paper]
Magvit: Masked generative video transformer [Paper]
CV-VAE: A Compatible Video VAE for Latent Generative Video Models [Paper]
Phenaki: Variable length video generation from open domain textual descriptions [Paper]
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation[Paper]
U-net: Convolutional networks for biomedical image segmentation [Paper]
Score-Based Generative Modeling through Stochastic Differential Equations [Paper]
Video diffusion models [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper]
Improving language understanding by generative pre-training [Paper]
GLAF: Global-to-Local Aggregation and Fission Network for Semantic Level Fact Verification [Paper]
Intention reasoning network for multi-domain end-to-end task-oriented dialogue [Paper]
An image is worth 16x16 words: Transformers for image recognition at scale [Paper]
Cmal: A novel crossmodal associative learning framework for vision-language pretraining[Paper]
Unitranser: A unified transformer semantic representation framework for multimodal task-oriented dialog system [Paper]
Hybridprompt: bridging language models and human priors in prompt tuning for visual question answering [Paper]
HybridPrompt: Bridging Language Models and Human Priors in Prompt Tuning for Visual Question Answering [Paper]
Generative pretraining from pixels [Paper]
Zero-shot text-to-image generation [Paper]
Exploring the limits of transfer learning with a unified text-to-text transformer [Paper]
Modeling Sequences with Structured State Spaces [Paper]
Exploring Adversarial Robustness of Deep State Space Models [Paper]
Hippo: Recurrent memory with optimal polynomial projections [Paper]
Efficiently modeling long sequences with structured state spaces [Paper]
Diagonal State Spaces are as Effective as Structured State Spaces [Paper]
Mamba: Linear-time sequence modeling with selective state spaces [Paper]
Dim: Diffusion mamba for efficient high-resolution image synthesis [Paper]
ZigMa: A DiT-style Zigzag Mamba Diffusion Model [Paper]
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models [Paper]
RWKV: Reinventing RNNs for the Transformer Era [Paper]
Dig: Scalable and efficient diffusion models with gated linear attention [Paper]
Gated Linear Attention Transformers with Hardware-Efficient Training [Paper]
An empirical study and analysis of text-to-image generation using large language model-powered textual representation[[Paper]]https://arxiv.org/abs/2405.12914)
Learning transferable visual models from natural language supervision [Paper]
Bert: Pre-training of deep bidirectional transformers for language understanding [Paper]
AltDiffusion: A Multilingual Text-to-Image Diffusion Model [Paper]
PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud [Paper]
PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud [Paper]
Learning transferable visual models from natural language supervision [Paper]
Exploring the limits of transfer learning with a unified text-to-text transformer [Paper]
Baichuan 2: Open large-scale language models [Paper]
Llama: Open and efficient foundation language models [Paper]
Llama 2: Open Foundation and Fine-Tuned Chat Models [Paper]
GLM: General Language Model Pretraining with Autoregressive Blank Infilling[Paper]

Part 4: Efficient Training and Fine-Tuning

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models [Paper]
Lora: Low-rank adaptation of large language models [Paper]
SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper]
I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models [Paper]
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models [Paper]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning [Paper]
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [Paper]
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [Paper]
Image-To-Image Translation With Conditional Adversarial Networks [Paper]
ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems [Paper]
Controlnext: Powerful and efficient control for image and video generation [Paper]
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [Paper]
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models [Paper]
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper]
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models [Paper]
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [Paper]
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models [Paper]
Efficient parametrization of multi-domain deep neural networks [Paper]
Facechain-imagineid: Freely crafting high-fidelity diverse talking faces from disentangled audio [Paper]
X-adapter: Adding universal compatibility of plugins for upgraded diffusion model [Paper]
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning [Paper]
Measuring the intrinsic dimension of objective landscapes [Paper]
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [Paper]
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [Paper]
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing [Paper]
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [Paper]
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference [Paper]
Aligning text-to-image models using human feedback [Paper]
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation [Paper]
Pick-a-pic: An open dataset of user preferences for text-to-image generation [Paper]
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [Paper]
Training diffusion models with reinforcement learning [Paper]
Reinforcement learning for fine-tuning text-to-image diffusion models[Paper]
Using human feedback to fine-tune diffusion models without any reward model [Paper]
Diffusion model alignment using direct preference optimization [Paper]
Direct Preference Optimization: Your Language Model is Secretly a Reward Model [Paper] *An image is worth one word: Personalizing text-to-image generation using textual inversion [Paper]
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation[Paper]
Photomaker: Customizing realistic human photos via stacked id embedding [Paper]
Hyperdreambooth: Hypernet-works for fast personalization of text-to-image models [Paper]
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [Paper]
InstantID: Zero-shot Identity-Preserving Generation in Seconds [Paper]
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [Paper]
Multi-concept customization of text-to-image diffusion [Paper]
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models [Paper]
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation [Paper]
Designing an encoder for fast personalization of text-to-image models [Paper]
Dreamtuner: Single image is enough for subject-driven generation [Paper]
Instantbooth: Personalized text-to-image generation without test-time finetuning [Paper]
Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[Paper]

Part 5: Efficient Sampling and Inference

Diffusion models: A comprehensive survey of methods and applications [Paper]
Progressive distillation for fast sampling of diffusion models [Paper]
On distillation of guided diffusion models, [Paper]
Adversarial Diffusion Distillation [Paper]
Flow straight and fast: Learning to generate and transfer data with rectified flow [Paper]
Instaflow: One step is enough for high-quality diffusion-based text-to-image generation[Paper] ![](https://img.shields.io/badge/ ICLR-2023-blue)
Ufogen: You forward once large scale text-to-image generation via diffusion gans[Paper]
Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps [Paper]
Elucidating the design space of diffusion-based generative models [Paper]
Denoising diffusion implicit models [Paper]
Generative Modeling by Estimating Gradients of the Data Distribution[Paper]
Adversarial score matching and improved sampling for image generation [Paper]
Score-Based Generative Modeling with Critically-Damped Langevin Diffusion [Paper]
Gotta Go Fast When Generating Data with Score-Based Models [Paper]
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction [Paper]
Pseudo Numerical Methods for Diffusion Models on Manifolds[Paper]
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models [Paper]
gDDIM: Generalized denoising diffusion implicit models [Paper]
Fast sampling of diffusion models with exponential integrator [Paper]
ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval[Paper]
Genie: higher-order denoising diffusion solvers [Paper]
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed [Paper]
Unipc: a unified predictor-corrector framework for fast sampling of diffusion models [Paper]
Accelerating diffusion sampling with optimized time steps [Paper]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations [Paper]
Uncovering the disentanglement capability in text-to-image diffusion models[Paper]
Distilling the Knowledge in a Neural Network [Paper]
One-step diffusion with distribution matching distillation[Paper]
Nvae: a deep hierarchical variational autoencoder [Paper]
Large Scale GAN Training for High Fidelity Natural Image Synthesis [Paper]
Classifier-free diffusion guidance [Paper]
Variational Diffusion Models [Paper]
Generative Adversarial Nets [Paper]
Optimizing DDPM Sampling with Shortcut Fine-Tuning [Paper]
Perflow: Piecewise rectified flow as universal plug-and-play accelerator [Paper]
Flow matching for generative modeling [Paper]
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions [Paper]
Fourier Neural Operator for Parametric Partial Differential Equations [Paper]
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation [Paper]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning [Paper]
Learning Universal Policies via Text-Guided Video Generation [Paper]
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models [Paper]
GANs trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium [Paper]
StyleGAN-NADA: CLIP-guided domain adaptation of image generators [Paper]
DINOv2: Learning Robust Visual Features without Supervision [Paper]
Which Training Methods for GANs do actually Converge? [Paper]
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs [Paper]
Semi-Implicit Denoising Diffusion Models (SIDDMs) [Paper]
Learning to Efficiently Sample from Diffusion Probabilistic Models [Paper]
Learning fast samplers for diffusion models by differentiating through sample quality [Paper]
Post-training quantization on diffusion models [Paper]
Ptqd: accurate post-training quantization for diffusion models [Paper]
Accelerating Diffusion Models via Early Stop of the Diffusion Process [Paper]
Truncated diffusion probabilistic models [Paper]
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders[Paper]
A style-based generator architecture for generative adversarial networks[Paper]

Part 6: Efficient Deployment And Usage

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations [Paper]
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds [Paper]
Mobilediffusion: Instant text-to-image generation on mobile devices [Paper]
Distrifusion: Distributed parallel inference for high resolution diffusion models[Paper]
Pipefusion: Displaced patch pipeline parallelism for inference of diffusion transformer models[Paper]
Dragondiffusion: Enabling drag-style manipulation on diffusion models [Paper]
Asyncdiff: Parallelizing diffusion models by asynchronous denoising [Paper]

Part 7: Discussion and Conclusion

A Survey on Mixture of Experts [Paper]
The evolution of mixture of experts: A survey from basics to breakthroughs
Dynamic Diffusion Transformer [Paper]
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [Paper]
Fast inference from transformers via speculative decoding [Paper]
Accelerating Large Language Model Decoding with Speculative Sampling [Paper]
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion [Paper]
T-stitch: Accelerating sampling in pretrained diffusion models with trajectory stitching [Paper]

Citation

If you find this work useful, welcome to cite us.

@article{ma2024efficient,
  title={Efficient diffusion models: A comprehensive survey from principles to practices},
  author={Ma, Zhiyuan and Zhang, Yuzhu and Jia, Guoli and Zhao, Liangliang and Ma, Yichao and Ma, Mingjie and Liu, Gaofeng and Zhang, Kaiyan and Li, Jianjun and Zhou, Bowen},
  journal={arXiv preprint arXiv:2410.11795},
  year={2024}
}

Acknowledge

We would like to express our gratitude to Qi‘ang Hu for his contribution to this website, and also express our gratitude to all team members.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Efficient-Diffusion-Models

📢 Updates

👀 Introduction

📒 Table of Contents

Part 1: Introduction

Part 2: Efficient Diffusion Models: Foundational Principles

Part 3: Mainstream Network Architectures

Part 4: Efficient Training and Fine-Tuning

Part 5: Efficient Sampling and Inference

Part 6: Efficient Deployment And Usage

Part 7: Discussion and Conclusion

Citation

Acknowledge

⭐ Star History

About

Uh oh!

Releases

Packages

Languages

TsinghuaC3I/Efficient-Diffusion-Models

Folders and files

Latest commit

History

Repository files navigation

Efficient-Diffusion-Models

📢 Updates

👀 Introduction

📒 Table of Contents

Part 1: Introduction

Part 2: Efficient Diffusion Models: Foundational Principles

Part 3: Mainstream Network Architectures

Part 4: Efficient Training and Fine-Tuning

Part 5: Efficient Sampling and Inference

Part 6: Efficient Deployment And Usage

Part 7: Discussion and Conclusion

Citation

Acknowledge

⭐ Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages