Skip to content

TsinghuaC3I/Efficient-Diffusion-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Efficient-Diffusion-Models

arXiv Maintenance Last Commit Contribution Welcome image

📢 Updates

👀 Introduction

Welcome to the repository for our survey paper, "Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices". This repository provides resources and updates related to our research. For a detailed introduction, please refer to our survey paper.

image The recent timeline of efficient DMs, covering core methods and the release of open-source and closed-source reproduction projects.

image This figure outlines the conceptual framework employed in our presentation of efficient diffusion models.

image This figure compares the core features of mainstream diffusion-based generative models

image This figure outline various adapters and their applications.

📒 Table of Contents

Part 1: Introduction

  • Improving image generation with better captions [Paper]
  • Plug-and-play diffusion features for text-driven image-to-image translation [Paper]
  • Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis [Paper]
  • Sine: Single image editing with text-to-image diffusion models [Paper]
  • Instructpix2pix: Learning to follow image editing instructions[Paper]
  • Latent video diffusion models for high-fidelity video generation with arbitrary lengths [Paper]
  • MagicVideo: Efficient Video Generation With Latent Diffusion Models [Paper]
  • ModelScope Text-to-Video Technical Report [Paper]
  • Stable video diffusion: Scaling latent video diffusion models to large datasets [Paper]
  • VideoCrafter1: Open Diffusion Models for High-Quality Video Generation [Paper]
  • VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models [Paper]
  • StableVideo: Text-driven Consistency-aware Diffusion Video Editing [Paper]
  • MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper]
  • Lumiere: A Space-Time Diffusion Model for Video Generation [Paper]
  • Text2video-zero: Text-to-image diffusion models are zero-shot video generators [Paper]
  • FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing [Paper]
  • Dreamix: Video Diffusion Models are General Video Editors [Paper]
  • ControlVideo: Training-free Controllable Text-to-Video Generation [Paper]
  • Rerender a video: Zero-shot text-guided video-to-video translation [Paper]
  • Dreamfusion: Text-to-3d using 2d diffusion [Paper]
  • Mvdream: Multi-view diffusion for 3d generation [Paper]
  • Magic3D: High-Resolution Text-to-3D Content Creation [Paper]
  • Hifa: High-fidelity text-to-3d with advanced diffusion guidance [Paper]
  • SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion [Paper]
  • DDM2: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models [Paper]
  • Solving Inverse Problems in Medical Imaging with Score-Based Generative Models [Paper]
  • Diffwave: A versatile diffusion model for audio synthesis[[Paper]] (https://arxiv.org/abs/2009.09761)
  • Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [Paper]
  • Diffsound: Discrete Diffusion Model for Text-to-Sound Generation [Paper]
  • Highly accurate protein structure prediction with AlphaFold [Paper]
  • Denovo design of protein structure and function with rfdiffusion [Paper]
  • Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures [Paper]
  • A dual diffusion model enables 3d molecule generation and lead optimization based on target pockets [Paper]
  • Diffdock: Diffusion steps, twists, and turns for molecular docking [Paper]
  • Fast sampling of diffusion models via operator learning [Paper]
  • Progressive distillation for fast sampling of diffusion models [Paper]
  • Consistency Models [Paper]
  • Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference [Paper]
  • GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [Paper]
  • High-Resolution Image Synthesis With Latent Diffusion Models [Paper]
  • Structure and content-guided video synthesis with diffusion models [Paper]
  • Maximum likelihood training of implicit nonlinear diffusion model [Paper]
  • Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data [Paper]
  • Maximum likelihood training for score-based diffusion odes by high order denoising score matching [Paper]
  • Generalized deep 3d shape prior via part-discretized diffusion process [Paper]
  • Vector quantized diffusion model for text-to-image synthesis [Paper]
  • Understanding Diffusion Models: A Unified Perspective [Paper]
  • Diffusion models in vision: A survey [Paper]
  • Diffusion models: A comprehensive survey of methods and applications [Paper]
  • A Survey on Generative Diffusion Model [Paper]
  • Emergent abilities of large language models [Paper]
  • GPT-4 Technical Report[Paper]
  • Video generation models as world simulators [Online]

Part 2: Efficient Diffusion Models: Foundational Principles

  • Improved Denoising Diffusion Probabilistic Models [Paper]
  • Score-Based Generative Modeling through Stochastic Differential Equations [Paper]
  • Taming Transformers for High-Resolution Image Synthesis[Paper]
  • Adding Conditional Control to Text-to-Image Diffusion Models [Paper]
  • Prompt-to-Prompt Image Editing with Cross Attention Control [Paper]
  • Null-text inversion for editing real images using guided diffusion models [Paper]
  • Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation [Paper]
  • Imagic: Text-Based Real Image Editing with Diffusion Models[Paper]
  • AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [Paper]
  • Cascaded diffusion models for high fidelity image generation [Paper]
  • Play-ground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation [Paper]
  • All are worth words: A vit backbone for diffusion models [Paper]

Part 3: Mainstream Network Architectures

  • Denoising diffusion implicit models [Paper]
  • Diffusion models beat gans on image synthesis[Paper]
  • Photorealistic text-to-image diffusion models with deep language understanding [Paper]
  • Hierarchical text-conditional image generation with clip latents [Paper]
  • CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers [Paper]
  • Sdxl: Improving latent diffusion models for high-resolution image synthesis [Paper]
  • Scaling rectified flow transformers for high-resolution image synthesis [Paper]
  • Scalable diffusion models with transformers [Paper]
  • Neural Residual Diffusion Models for Deep Scalable Vision Generation[Paper]
  • PixArt-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis [Paper]
  • FiT: Flexible Vision Transformer for Diffusion Model [Paper]
  • SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers [Paper]
  • Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding [Paper]
  • Kolors: Effective training of diffusion model for photorealistic text-to-image synthesis [Online]
  • Flux [Online]
  • Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [Paper]
  • Open-Sora: Democratizing Efficient Video Production for All [Paper]
  • Open-Sora: Democratizing Efficient Video Production for All [Online]
  • EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture [Paper]
  • CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer [Paper]
  • Movie gen: A cast of media foundation models [Online]
  • Auto-Encoding Variational Bayes [Paper]
  • Taming transformers for high-resolution image synthesis [Paper]
  • Neural Discrete Representation Learning [Paper]
  • Autoregressive Image Generation Using Residual Quantization [Paper]
  • Imagen Video: High Definition Video Generation with Diffusion Models [Paper]
  • One transformer fits all distributions in multi-modal diffusion at scale [Paper]
  • Lmd: faster image reconstruction with latent masking diffusion [Paper]
  • Make-A-Video: Text-to-Video Generation without Text-Video Data [Paper]
  • Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation [Paper]
  • Magvit: Masked generative video transformer [Paper]
  • CV-VAE: A Compatible Video VAE for Latent Generative Video Models [Paper]
  • Phenaki: Variable length video generation from open domain textual descriptions [Paper]
  • Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation[Paper]
  • U-net: Convolutional networks for biomedical image segmentation [Paper]
  • Score-Based Generative Modeling through Stochastic Differential Equations [Paper]
  • Video diffusion models [Paper]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper]
  • Improving language understanding by generative pre-training [Paper]
  • GLAF: Global-to-Local Aggregation and Fission Network for Semantic Level Fact Verification [Paper]
  • Intention reasoning network for multi-domain end-to-end task-oriented dialogue [Paper]
  • An image is worth 16x16 words: Transformers for image recognition at scale [Paper]
  • Cmal: A novel crossmodal associative learning framework for vision-language pretraining[Paper]
  • Unitranser: A unified transformer semantic representation framework for multimodal task-oriented dialog system [Paper]
  • Hybridprompt: bridging language models and human priors in prompt tuning for visual question answering [Paper]
  • HybridPrompt: Bridging Language Models and Human Priors in Prompt Tuning for Visual Question Answering [Paper]
  • Generative pretraining from pixels [Paper]
  • Zero-shot text-to-image generation [Paper]
  • Exploring the limits of transfer learning with a unified text-to-text transformer [Paper]
  • Modeling Sequences with Structured State Spaces [Paper]
  • Exploring Adversarial Robustness of Deep State Space Models [Paper]
  • Hippo: Recurrent memory with optimal polynomial projections [Paper]
  • Efficiently modeling long sequences with structured state spaces [Paper]
  • Diagonal State Spaces are as Effective as Structured State Spaces [Paper]
  • Mamba: Linear-time sequence modeling with selective state spaces [Paper]
  • Dim: Diffusion mamba for efficient high-resolution image synthesis [Paper]
  • ZigMa: A DiT-style Zigzag Mamba Diffusion Model [Paper]
  • Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models [Paper]
  • RWKV: Reinventing RNNs for the Transformer Era [Paper]
  • Dig: Scalable and efficient diffusion models with gated linear attention [Paper]
  • Gated Linear Attention Transformers with Hardware-Efficient Training [Paper]
  • An empirical study and analysis of text-to-image generation using large language model-powered textual representation[[Paper]]https://arxiv.org/abs/2405.12914)
  • Learning transferable visual models from natural language supervision [Paper]
  • Bert: Pre-training of deep bidirectional transformers for language understanding [Paper]
  • AltDiffusion: A Multilingual Text-to-Image Diffusion Model [Paper]
  • PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud [Paper]
  • PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud [Paper]
  • Learning transferable visual models from natural language supervision [Paper]
  • Exploring the limits of transfer learning with a unified text-to-text transformer [Paper]
  • Baichuan 2: Open large-scale language models [Paper]
  • Llama: Open and efficient foundation language models [Paper]
  • Llama 2: Open Foundation and Fine-Tuned Chat Models [Paper]
  • GLM: General Language Model Pretraining with Autoregressive Blank Infilling[Paper]

Part 4: Efficient Training and Fine-Tuning

  • Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models [Paper]
  • Lora: Low-rank adaptation of large language models [Paper]
  • SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper]
  • I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models [Paper]
  • T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models [Paper]
  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning [Paper]
  • eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [Paper]
  • Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [Paper]
  • Image-To-Image Translation With Conditional Adversarial Networks [Paper]
  • ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems [Paper]
  • Controlnext: Powerful and efficient control for image and video generation [Paper]
  • ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [Paper]
  • Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models [Paper]
  • Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper]
  • SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models [Paper]
  • Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [Paper]
  • It's All About Your Sketch: Democratising Sketch Control in Diffusion Models [Paper]
  • Efficient parametrization of multi-domain deep neural networks [Paper]
  • Facechain-imagineid: Freely crafting high-fidelity diverse talking faces from disentangled audio [Paper]
  • X-adapter: Adding universal compatibility of plugins for upgraded diffusion model [Paper]
  • Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning [Paper]
  • Measuring the intrinsic dimension of objective landscapes [Paper]
  • LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [Paper]
  • LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [Paper]
  • DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing [Paper]
  • Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [Paper]
  • Human Preference Score: Better Aligning Text-to-Image Models with Human Preference [Paper]
  • Aligning text-to-image models using human feedback [Paper]
  • ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation [Paper]
  • Pick-a-pic: An open dataset of user preferences for text-to-image generation [Paper]
  • RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [Paper]
  • Training diffusion models with reinforcement learning [Paper]
  • Reinforcement learning for fine-tuning text-to-image diffusion models[Paper]
  • Using human feedback to fine-tune diffusion models without any reward model [Paper]
  • Diffusion model alignment using direct preference optimization [Paper]
  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model [Paper] *An image is worth one word: Personalizing text-to-image generation using textual inversion [Paper]
  • ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation[Paper]
  • Photomaker: Customizing realistic human photos via stacked id embedding [Paper]
  • Hyperdreambooth: Hypernet-works for fast personalization of text-to-image models [Paper]
  • BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [Paper]
  • InstantID: Zero-shot Identity-Preserving Generation in Seconds [Paper]
  • OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [Paper]
  • Multi-concept customization of text-to-image diffusion [Paper]
  • Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models [Paper]
  • MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation [Paper]
  • Designing an encoder for fast personalization of text-to-image models [Paper]
  • Dreamtuner: Single image is enough for subject-driven generation [Paper]
  • Instantbooth: Personalized text-to-image generation without test-time finetuning [Paper]
  • Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation[Paper]

Part 5: Efficient Sampling and Inference

  • Diffusion models: A comprehensive survey of methods and applications [Paper]
  • Progressive distillation for fast sampling of diffusion models [Paper]
  • On distillation of guided diffusion models, [Paper]
  • Adversarial Diffusion Distillation [Paper]
  • Flow straight and fast: Learning to generate and transfer data with rectified flow [Paper]
  • Instaflow: One step is enough for high-quality diffusion-based text-to-image generation[Paper] ![](https://img.shields.io/badge/ ICLR-2023-blue)
  • Ufogen: You forward once large scale text-to-image generation via diffusion gans[Paper]
  • Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps [Paper]
  • Elucidating the design space of diffusion-based generative models [Paper]
  • Denoising diffusion implicit models [Paper]
  • Generative Modeling by Estimating Gradients of the Data Distribution[Paper]
  • Adversarial score matching and improved sampling for image generation [Paper]
  • Score-Based Generative Modeling with Critically-Damped Langevin Diffusion [Paper]
  • Gotta Go Fast When Generating Data with Score-Based Models [Paper]
  • Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction [Paper]
  • Pseudo Numerical Methods for Diffusion Models on Manifolds[Paper]
  • DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models [Paper]
  • gDDIM: Generalized denoising diffusion implicit models [Paper]
  • Fast sampling of diffusion models with exponential integrator [Paper]
  • ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval[Paper]
  • Genie: higher-order denoising diffusion solvers [Paper]
  • Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed [Paper]
  • Unipc: a unified predictor-corrector framework for fast sampling of diffusion models [Paper]
  • Accelerating diffusion sampling with optimized time steps [Paper]
  • SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations [Paper]
  • Uncovering the disentanglement capability in text-to-image diffusion models[Paper]
  • Distilling the Knowledge in a Neural Network [Paper]
  • One-step diffusion with distribution matching distillation[Paper]
  • Nvae: a deep hierarchical variational autoencoder [Paper]
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis [Paper]
  • Classifier-free diffusion guidance [Paper]
  • Variational Diffusion Models [Paper]
  • Generative Adversarial Nets [Paper]
  • Optimizing DDPM Sampling with Shortcut Fine-Tuning [Paper]
  • Perflow: Piecewise rectified flow as universal plug-and-play accelerator [Paper]
  • Flow matching for generative modeling [Paper]
  • Stochastic Interpolants: A Unifying Framework for Flows and Diffusions [Paper]
  • Fourier Neural Operator for Parametric Partial Differential Equations [Paper]
  • Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation [Paper]
  • AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning [Paper]
  • Learning Universal Policies via Text-Guided Video Generation [Paper]
  • DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models [Paper]
  • GANs trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium [Paper]
  • StyleGAN-NADA: CLIP-guided domain adaptation of image generators [Paper]
  • DINOv2: Learning Robust Visual Features without Supervision [Paper]
  • Which Training Methods for GANs do actually Converge? [Paper]
  • Tackling the Generative Learning Trilemma with Denoising Diffusion GANs [Paper]
  • Semi-Implicit Denoising Diffusion Models (SIDDMs) [Paper]
  • Learning to Efficiently Sample from Diffusion Probabilistic Models [Paper]
  • Learning fast samplers for diffusion models by differentiating through sample quality [Paper]
  • Post-training quantization on diffusion models [Paper]
  • Ptqd: accurate post-training quantization for diffusion models [Paper]
  • Accelerating Diffusion Models via Early Stop of the Diffusion Process [Paper]
  • Truncated diffusion probabilistic models [Paper]
  • Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders[Paper]
  • A style-based generator architecture for generative adversarial networks[Paper]

Part 6: Efficient Deployment And Usage

  • Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations [Paper]
  • SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds [Paper]
  • Mobilediffusion: Instant text-to-image generation on mobile devices [Paper]
  • Distrifusion: Distributed parallel inference for high resolution diffusion models[Paper]
  • Pipefusion: Displaced patch pipeline parallelism for inference of diffusion transformer models[Paper]
  • Dragondiffusion: Enabling drag-style manipulation on diffusion models [Paper]
  • Asyncdiff: Parallelizing diffusion models by asynchronous denoising [Paper]

Part 7: Discussion and Conclusion

  • A Survey on Mixture of Experts [Paper]
  • The evolution of mixture of experts: A survey from basics to breakthroughs
  • Dynamic Diffusion Transformer [Paper]
  • Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [Paper]
  • Fast inference from transformers via speculative decoding [Paper]
  • Accelerating Large Language Model Decoding with Speculative Sampling [Paper]
  • Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion [Paper]
  • T-stitch: Accelerating sampling in pretrained diffusion models with trajectory stitching [Paper]

Citation

If you find this work useful, welcome to cite us.

@article{ma2024efficient,
  title={Efficient diffusion models: A comprehensive survey from principles to practices},
  author={Ma, Zhiyuan and Zhang, Yuzhu and Jia, Guoli and Zhao, Liangliang and Ma, Yichao and Ma, Mingjie and Liu, Gaofeng and Zhang, Kaiyan and Li, Jianjun and Zhou, Bowen},
  journal={arXiv preprint arXiv:2410.11795},
  year={2024}
}

Acknowledge

We would like to express our gratitude to Qi‘ang Hu for his contribution to this website, and also express our gratitude to all team members.

⭐ Star History

Star History Chart