Skip to content

Pytorch Implementation of Vision Transformer Paper by Google.

Notifications You must be signed in to change notification settings

SrinjaySarkar/ViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Implementation of Vision Transformer . Based on the paper:

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
arXiv:2010.11929

Visualization of attention maps of correctly classified samples can be found in the _visualizations folder.

Imagenet-1k Sample

Predicted Label : Vulture

Real Label : Vulture

Layer 1

Map of Layer1

Layer 2

Map of Layer1

Layer 3

Map of Layer1

Layer 4

Map of Layer1

Layer 5

Map of Layer1

Layer 6

Map of Layer1

Layer 7

Map of Layer1

Layer 8

Map of Layer1

Layer 9

Map of Layer1

Layer 10

Map of Layer1

Layer 11

Map of Layer1

Layer 12

Map of Layer1

Usage

The entire code is self contained in the Jupyter notebook,just run the cells sequentially. It is made this way for ease of training on Google Colab.

To-Do/Coming Soon:

  • Finetune on CIFAR-10, CIFAR100 and plot visualizations.
  • Implement the hybrid approach based on Resnet feature Maps.

References

About

Pytorch Implementation of Vision Transformer Paper by Google.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published