Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Generate Image with depth 30 model
- Model Generate Image Coarse-to-Fine
- low resolution token determines the overall color
- high resolution token gradually adds detailed information in a residual manner
- Model can't generate person (I think this is becuase there is no prior on people)
- Model can't generate when there are multiple objects (I guess...)
This project is licensed under the MIT License - see the LICENSE file for details.
If our work assists your research, feel free to give us a star ā or cite us using:
@Article{VAR,
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
year={2024},
eprint={2404.02905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}