The official repository of the paper: "TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models".
- Training script
- Inference script
- Eval script
- Ablation models' scripts
- Baseline models' scripts
Create a new Conda environment:
conda create -n vtoff python=3.11
conda activate vtoff
Then, clone the repository, install the required packages:
git clone https://github.com/rizavelioglu/tryoffdiff.git
cd tryoffdiff
pip install -e .
Download the original VITON-HD dataset and extract it to "./data/vitonhd"
:
python tryoffdiff/dataset.py download-vitonhd # For a different location: output-dir="<other-folder>"
As mentioned in the paper, the original dataset contains duplicates, and some training samples are leaked into the test set. Clean these with the following command:
python tryoffdiff/dataset.py clean-vitonhd # Default: `data-dir="./data/vitonhd"`
For faster training, pre-extract the image features and save them instead of extracting them during training.
python tryoffdiff/dataset.py vae-encode-vitonhd \
--data-dir "./data/vitonhd/" \
--model-name "sd14" \
--batch-size 16
python tryoffdiff/dataset.py siglip-encode-vitonhd \
--data-dir "./data/vitonhd/" \
--batch-size 64
- Option 1 (GPU-poor) - Train with a single GPU:
Execute the following
python tryoffdiff/modeling/train.py tryoffdiff \
--save-dir "./models/" \
--data-dir "./data/vitonhd-enc-sd14/" \
--model-class-name "TryOffDiff" \
--mixed-precision "no" \
--learning-rate 0.0001 \
--train-batch-size 16 \
--num-epochs 1201 \
--save-model-epochs 100 \
--checkpoint-every-n-epochs 100
- Option 2 - Train with 4-GPUs on a single node (as done in the paper):
First, configure accelerate
accordingly:
accelerate config
We did not use any of the tools like dynamo, DeepSpeed, FullyShardedDataParallel etc.
Then, start training:
accelerate launch --multi_gpu --num_processes=4 tryoffdiff/modeling/train.py tryoffdiff \
--save-dir "./models/" \
--data-dir "./data/vitonhd-enc-sd14/" \
--model-class-name "TryOffDiff" \
--mixed-precision "no" \
--learning-rate 0.0001 \
--train-batch-size 16 \
--num-epochs 1201 \
--save-model-epochs 100 \
--checkpoint-every-n-epochs 100
Note: See config.py(TrainingConfig) for all possible arguments, e.g. set
resume_from_checkpoint
to resume training from a specific checkpoint.
Other models presented in the ablation study can be trained similarly. View all available models:
python tryoffdiff/modeling/train.py --help
[...Work in progress...]
Each model has its own command. View all available options:
python tryoffdiff/modeling/predict.py --help
Example: Run inference with
TryOffDiff
:python tryoffdiff/modeling/predict.py tryoffdiff \ --model-dir "/model_20241007_154516/" \ --model-filename "model_epoch_1200.pth" \ --batch-size 8 \ --num-inference-steps 50 \ --seed 42 \ --guidance-scale 2.0
which saves predictions to "<model-dir>/preds/"
as .png
files.
Note: See config.py(InferenceConfig) for all possible arguments, e.g. use the
--all
flag to run inference on the entire test set.
Note: The paper uses the PNDM noise scheduler. For HuggingFace Spaces we use the EulerDiscrete scheduler.
Evaluate the predictions using:
python tryoffdiff/modeling/eval.py \
--gt-dir "./data/vitonhd/test/cloth/" \
--pred-dir "<prediction-dir>" \
--batch-size 32 \
--num-workers 4
which prints the results to the console. Specifically, we use the following libraries for the implementations of the metrics presented in the paper:
pyiqa
:SSIM
,MS-SSIM
,CW-SSIM
, andLPIPS
,clean-fid
:FID
,CLIP-FID
, andKID
,DISTS-pytorch
:DISTS
In addition, we offer a simple GUI for visualizing predictions alongside their evaluation metrics. This tool displays the ground truth and predicted images side-by-side while providing metrics for the entire test set:
python tryoffdiff/modeling/eval_vis.py \
--gt-dir "./data/vitonhd/test/cloth/" \
--pred-dir "<prediction-dir>"
The following project/directory structure is adopted: Cookiecutter Data Science-v2 by DrivenData
├── notebooks/ <- Jupyter notebooks
├── references/ <- Manuals and all other explanatory materials.
├── LICENSE
├── README.md
├── pyproject.toml <- Project configuration file with package metadata
|
└── tryoffdiff/ <- Source code for use in this project.
├── modeling/
│ ├── __init__.py
│ ├── eval.py <- Code to evaluate models
│ ├── model.py <- Model implementations
│ ├── predict.py <- Code to run model inference with trained models
│ └── train.py <- Code to train models
|
├── __init__.py <- Makes `tryoffdiff` a Python module
├── config.py <- Store configuration variables
├── dataset.py <- Download and clean VITON-HD dataset
├── features.py <- Code to create features for modeling
└── plots.py <- Code to create visualizations
Our code relies on PyTorch, with 🤗 Diffusers for diffusion model components
and 🤗 Accelerate for multi-GPU training.
We adopt Stable Diffusion-v1.4 as the base model and use
SigLIP as the image encoder.
For evaluation, we use IQA_PyTorch,
clean-fid,
and DISTS-pytorch.
TL;DR: Not available for commercial use, unless the FULL source code is open-sourced!
This project is intended solely for academic research. No commercial benefits are derived from it.
The code, datasets, and models are published under the Server Side Public License (SSPL).
If you find this repository useful in your research, please consider giving a star ⭐ and a citation:
@article{velioglu2024tryoffdiff,
title = {TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models},
author = {Velioglu, Riza and Bevandic, Petra and Chan, Robin and Hammer, Barbara},
journal = {arXiv preprint arXiv:2411.18350},
year = {2024},
note = {\url{https://doi.org/nt3n}}
}