Skip to content

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

License

Notifications You must be signed in to change notification settings

FoundationVision/Infinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Infinity $\infty$: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

arXiv  arXiv 

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

🔥 Updates!!

  • Dec 12, 2024: 💻 Add Project Page
  • Dec 5, 2024: 🤗 Paper release

📑 Open-Source Plan

  • Infinity-2B (Text-to-Image Model)
    • Web Demo
    • Inference
    • Checkpoints

📖 Introduction

We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution, photorealistic images following language instruction. Infinity refactors visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary classifier and bitwise self-correction mechanism. By theoretically expanding the tokenizer vocabulary size to infinity in Transformer, our method significantly unleashes powerful scaling capabilities to infinity compared to vanilla VAR. Extensive experiments indicate Infinity outperforms AutoRegressive Text-to-Image models by large margins, matches or exceeds leading diffusion models. Without extra optimization, Infinity generates a 1024 $\times$ 1024 image in 0.8s, 2.6 $\times$ faster than SD3-Medium, making it the fastest Text-to-Image model. Models and codes are released to promote further exploration of Infinity for visual generation.

License

This project is licensed under the MIT License - see the LICENSE file for details.