Our CompLift approach offers a straightforward solution to improve diffusion models without any extra training. When using regular diffusion models (shown above), combining multiple concepts in one generation often results in missing or incorrectly rendered elements. CompLift solves this problem by introducing a smart rejection way that measures how well each generated sample matches the individual intended components in the prompt, which we call the Lift Score. For more details, please refer to our paper.
This repository contains the official implementation of Improving Compositional Generation with Diffusion Models Using Lift Scores. The project is split into two main components:
- 🎯 2D and CLEVR Tasks (2d-and-clevr/): Implementation for synthetic 2D tasks and CLEVR position tasks.
- 🖼️ Text-to-Image Generation (text-to-image/): Implementation for improving compositional generation in text-to-image diffusion models like Stable Diffusion.
Note: the SAM2 folder is only used for the CLEVR task. See the CLEVR README for more details.
- 📚 2D and CLEVR Documentation
- 📚 Text-to-Image Documentation
- 📓 2D Tasks Colab Notebook
- 📓 CLEVR Tasks Colab Notebook
- 📓 Text-to-Image Colab Notebook
- 🚀 No Extra Training Required: CompLift improves diffusion models without requiring additional training.
- 🎯 Smart Rejection Mechanism: Measures how well generated samples match their intended descriptions.
- 🔄 Multiple Model Support: Works with various diffusion models including Stable Diffusion 1.4, 2.1, and XL.
- 📊 Comprehensive Evaluation: Includes both synthetic 2D tasks and real-world CLEVR position tasks.
- 🎮 Easy to Use: Provides Jupyter notebooks for quick experimentation and familiarization.
Each component has its own installation requirements. Please refer to the respective README files:
# Train models
bash 2d-and-clevr/scripts/train_2d.sh
# Evaluate baselines
python -m 2d-and-clevr/scripts.run_baselines_2d
python -m 2d-and-clevr/scripts.run_baselines_clevr +method=METHOD experiment_name=YOUR_EXPERIMENT_NAME num_constraints=NUM_CONSTRAINTS num_samples_to_generate=NUM_SAMPLES_TO_GENERATE
python text-to-image/run.py --prompt "a cat and a dog" --seeds [0] --sd_xl=True --run_standard_sd=True --save_intermediate_latent=True --output_path "outputs/example"
For detailed usage instructions, please refer to the respective component READMEs.
If you use this code for your research, please cite our work:
@inproceedings{yu2025improving,
title={Improving Compositional Generation with Diffusion Models Using Lift Scores},
author={Yu, Chenning and Gao, Sicun},
booktitle={Proceedings of the 42nd International Conference on Machine Learning (ICML)},
year={2025}
}