You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After going through the code, I got some questions.
In the first stage of training discrete VAE, we already trained a code book. Why we don't use it for second stage training but initialize a new code book for images.
During training, we use the original image as input. During inference, how to set the image input? Is it a random noise with size 3x256x256? How do we do the casual attention in transformer for inference?
The text was updated successfully, but these errors were encountered:
Thanks for this great work!
After going through the code, I got some questions.
In the first stage of training discrete VAE, we already trained a code book. Why we don't use it for second stage training but initialize a new code book for images.
During training, we use the original image as input. During inference, how to set the image input? Is it a random noise with size 3x256x256? How do we do the casual attention in transformer for inference?
The text was updated successfully, but these errors were encountered: