Skip to content
/ Lafite Public

Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)

License

Notifications You must be signed in to change notification settings

drboog/Lafite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1ad161c · Mar 23, 2023

History

50 Commits
Mar 10, 2022
Mar 10, 2022
Apr 15, 2022
Apr 17, 2022
Apr 13, 2022
Mar 7, 2022
Mar 23, 2023
Mar 10, 2022
Mar 27, 2022
Mar 27, 2022
Mar 27, 2022
Mar 10, 2022
Mar 10, 2022
Mar 10, 2022

Repository files navigation

Lafite

Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)

Looking for a better language-free method? Try this.

Requirements

The implementation is based on stylegan2-ada-pytorch and CLIP, the required packages can be found in the links.

Preparing Datasets

Example:

python dataset_tool.py --source=./path_to_some_dataset/ --dest=./datasets/some_dataset.zip --width=256 --height=256 --transform=center-crop

the files at ./path_to_some_dataset/ should be like:

./path_to_some_dataset/

  ├  1.png

  ├  1.txt

  ├  2.png

  ├  2.txt

  ├  ...

We provide links to several commonly used datasets that we have already processed (with CLIP-ViT/B-32):

MS-COCO Training Set

MS-COCO Validation Set

LN-COCO Training Set

LN-COCO Testing Set

Multi-modal CelebA-HQ Training Set

Multi-modal CelebA-HQ Testing Set

CUB Training Set

CUB Testing Set

Training

These hyper-parameters are used for MS-COCO. Please tune itd, itc and gamma on different datasets, they might be sensitive to datasets.

Examples:

Training with ground-truth pairs

python train.py --gpus=4 --outdir=./outputs/ --temp=0.5 --itd=5 --itc=10 --gamma=10 --mirror=1 --data=./datasets/COCO2014_train_CLIP_ViTB32.zip --test_data=./datasets/COCO2014_val_CLIP_ViTB32.zip --mixing_prob=0.0

Training with language-free methods (pseudo image-text feature pairs)

python train.py --gpus=4 --outdir=./outputs/ --temp=0.5 --itd=10 --itc=10 --gamma=10 --mirror=1 --data=./datasets/COCO2014_train_CLIP_ViTB32.zip --test_data=./datasets/COCO2014_val_CLIP_ViTB32.zip --mixing_prob=1.0

Pre-trained Models

Here we provide several pre-trained models (on google drive).

Model trained on MS-COCO, Language-free (Lafite-G), CLIP-ViT/B-32

Model trained on MS-COCO, Language-free (Lafite-NN), CLIP-ViT/B-32

Model trained on MS-COCO with Ground-truth Image-text Pairs, CLIP-ViT/B-32

Model trained on MS-COCO with Ground-truth Image-text Pairs, CLIP-ViT/B-16

Model Pre-trained On Google CC3M

Testing

Calculating metrics:

python calc_metrics.py --network=./some_pre-trained_models.pkl --metrics=fid50k_full,is50k --data=./training_data.zip --test_data=./testing_data.zip

To generate images with pre-trained models, you can use ./generate.ipynb. Also, you can try this Colab notebook by @voodoohop, in which the model pre-trained on CC3M is used.

To calculate SOA scores for MS-COCO, you can use ./generate_for_soa.py and Semantic Object Accuracy for Generative Text-to-Image Synthesis

Citation

@article{zhou2021lafite,
  title={LAFITE: Towards Language-Free Training for Text-to-Image Generation},
  author={Zhou, Yufan and Zhang, Ruiyi and Chen, Changyou and Li, Chunyuan and Tensmeyer, Chris and Yu, Tong and Gu, Jiuxiang and Xu, Jinhui and Sun, Tong},
  journal={arXiv preprint arXiv:2111.13792},
  year={2021}
}

Please contact [email protected] if you have any question.

About

Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published