https://iclr.cc/virtual/2021/poster/3013
Original Paper Title: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Authors: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
Codes:
- Keras Implementation: https://keras.io/examples/vision/image_classification_with_vision_transformer/
- PyTorch Implementation: https://www.learnpytorch.io/08_pytorch_paper_replicating/
- TensorFlow Implementation: https://github.com/taki0112/vit-tensorflow/blob/main/vit_tensorflow/vit.py
Blogs:
- https://khvmaths.medium.com/vision-transformer-understanding-the-underlying-concept-83d699d71180
- https://medium.com/analytics-vidhya/vision-transformers-bye-bye-convolutions-e929d022e4ab
- https://deepganteam.medium.com/vision-transformers-for-computer-vision-9f70418fe41a
- https://jalammar.github.io/illustrated-transformer/
- https://machinelearningmastery.com/the-attention-mechanism-from-scratch/