Latent Diffusion Model with Shared and Modality-Specific Representations

This project implements a framework for training Latent Diffusion Models (LDM) using disentangled representations. Specifically, we separate input data into shared (c) and modality-specific (uv) components, and model the generation process in the latent space using VAEs and Diffusion Models.

Folder Structure

.
├── vae_uv/               # VAE for modality-specific (uv) representation
│   └── vae_train.py      # Training script for VAE
│
├── diffusion_uv/         # Diffusion model operating on uv and c parts
│   └── diffusion_train.py# Training script for latent diffusion
│
├── utils/                # Utility functions (optional)
├── data/                 # Dataset files (if any)
└── README.md             # This file

Representation Disentanglement

uv: Modality-specific representation extracted from each modality.
c: Shared or common representation across modalities.
These representations are first learned using a VAE and then used for conditional or unconditional diffusion model training.

1. VAE Training for Modality-Specific Representation

The VAE is trained to encode and reconstruct the uv (modality-specific) latent variables.

Usage

cd vae_uv
python vae_train.py

Make sure to modify vae_train.py to point to your dataset and specify hyperparameters if needed.

2. Diffusion Training in Latent Space

The diffusion model operates on the latent space defined by uv and optionally conditioned on c.

Usage

cd diffusion_uv
python diffusion_train.py

Ensure that:

The pretrained VAE model is properly loaded to encode the input into latent space.
The c (shared part) is extracted and passed as a condition (if applicable).

Dependencies

Python 3.10.16
Other common libraries: numpy, tqdm, torchvision, etc.

You can install them via:

pip install -r requirements.txt

Notes

Make sure the output of vae_train.py (typically latent representations) is saved and used in diffusion_train.py.
You can customize the encoder-decoder architecture in vae_uv, or diffusion configuration in diffusion_uv.

Acknowledgment

The code is based on https://github.com/mueller-franzes/medfusion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Diffusion Model with Shared and Modality-Specific Representations

Folder Structure

Representation Disentanglement

1. VAE Training for Modality-Specific Representation

Usage

2. Diffusion Training in Latent Space

Usage

Dependencies

Notes

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
diffusion_uv		diffusion_uv
vae_uv		vae_uv
README.md		README.md
requirements.txt		requirements.txt

script-Yang/UVC-LDM

Folders and files

Latest commit

History

Repository files navigation

Latent Diffusion Model with Shared and Modality-Specific Representations

Folder Structure

Representation Disentanglement

1. VAE Training for Modality-Specific Representation

Usage

2. Diffusion Training in Latent Space

Usage

Dependencies

Notes

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages