Distillable_ViT

A repository for the Distillable ViT, with and without Label Smoothing

Weights can be downloaded from - https://drive.google.com/drive/folders/1iGew6DMDdIorm-f_73fvL8sYpMIiwIw6?usp=sharing

Distillation

A recent paper has shown that use of a distillation token for distilling knowledge from convolutional nets to vision transformer can yield small and efficient vision transformers. This repository offers the means to do distillation easily.

ex. distilling from Resnet50 (or any teacher) to a vision transformer

Usage of LSR (Label Smoothing)

!pip install timm==0.3.2
from timm.loss import LabelSmoothingCrossEntropy

smoothing = True
retrospect = False

value = 0.1 
if smoothing : 
    base_criterion = LabelSmoothingCrossEntropy(smoothing = value)
    
elif retrospect: 
    base_criterion = LWR(
    k=1,
    update_rate=0.9,
    num_batches_per_epoch=len(train_data) // batch_size,
    dataset_length=len(train_data),
    output_shape=(2, ),
    tau=5,
    max_epochs=20,
    softmax_dim=1
)
      
else : 
    base_criterion = nn.CrossEntropyLoss()

criterion = DistillationLoss(
    base_criterion, teacher, 'none', 0.5, 1.0, smoothing, retrospect)

Usage of Learning with Retrospection (LWR)

Refer to - https://github.com/The-Learning-Machines/LearningWithRetrospection/blob/main/LearningWithRetrospection.py

smoothing = False
retrospect = True

value = 0.1 
if smoothing : 
    base_criterion = LabelSmoothingCrossEntropy(smoothing = value)
    
elif retrospect: 
    base_criterion = LWR(
    k=1,
    update_rate=0.9,
    num_batches_per_epoch=len(train_data) // batch_size,
    dataset_length=len(train_data),
    output_shape=(2, ),
    tau=5,
    max_epochs=20,
    softmax_dim=1
)
      
else : 
    base_criterion = nn.CrossEntropyLoss()

criterion = DistillationLoss(
    base_criterion, teacher, 'none', 0.5, 1.0, smoothing, retrospect)

Citations

Code base from - https://github.com/lucidrains/vit-pytorch

@misc{dosovitskiy2020image,
    title   = {An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
    author  = {Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
    year    = {2020},
    eprint  = {2010.11929},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

@article{touvron2020deit,
  title={Training data-efficient image transformers & distillation through attention},
  author={Hugo Touvron and Matthieu Cord and Matthijs Douze and Francisco Massa and Alexandre Sablayrolles and Herv\'e J\'egou},
  journal={arXiv preprint arXiv:2012.12877},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
catsdogs.ipynb		catsdogs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distillable_ViT

Distillation

Usage of LSR (Label Smoothing)

Usage of Learning with Retrospection (LWR)

Citations

About

Releases

Packages

Languages

The-Learning-Machines/Distillable_ViT

Folders and files

Latest commit

History

Repository files navigation

Distillable_ViT

Distillation

Usage of LSR (Label Smoothing)

Usage of Learning with Retrospection (LWR)

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages