Pytorch Implementation of Vision Transformer . Based on the paper:
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
arXiv:2010.11929
Visualization of attention maps of correctly classified samples can be found in the _visualizations
folder.
The entire code is self contained in the Jupyter notebook,just run the cells sequentially. It is made this way for ease of training on Google Colab.
- Finetune on CIFAR-10, CIFAR100 and plot visualizations.
- Implement the hybrid approach based on Resnet feature Maps.