Skip to content
/ vit Public

From scratch repro of ViT, DeiT and Swin Transformers

Notifications You must be signed in to change notification settings

mnjm/vit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformers (ViT)

A minimal PyTorch implementation of Vision Transformers(ViT), its varients Data efficient Image Transformers (DeiT) and Swin Transformers. Experimented with CIFAR-100 (ViT-T/8 vs DeiT-T/8) and Tiny-Imagenet dataset with a (ViT-T/8 vs Swin-T-TinyImageNet), but supports other varients as well.

Architectural correctness is tested via parameter counts and output parity, matched against torchvision implementations (with exceptions for Swin due to differing internal choices).

Configuration is managed using Hydra, with optional experiment tracking via Weights & Biases (wandb).

ViT-T/8 vs DeiT-T/8 on CIFAR-100

CIFAR-100 Plots

ViT-B/8 vs Swin-T-TinyImageNet on TinyImageNet

Tiny Image Net Plots

Setup

  • Install uv and run
uv sync

Training Runs

Train ViT-T/8 on CIFAR-100

uv run train.py +run=vit-cifar100

Train DeiT-T/8 on CIFAR-100

uv run train.py +run=deit-cifar100

Uses frozen resnet18_cifar100 (via timm) as Teacher and is used for hard distillation (as it is showen to work well in DeiT paper)

Train ViT-T/8 on Tiny-Imagenet

uv run train.py +run=vit-tiny-imagenet

Train Swin-T on Tiny-Imagenet

uv run train.py +run=swin-tiny-imagenet

Structure

.
├── config/
│   ├── dataset/        # Dataset configs
│   ├── model/          # Model configs (ViT / DeiT / Swin)
│   ├── run/            # Experiment presets
│   └── default.yaml    # Global defaults
├── model/              # Model implementations
├── data.py             # Dataset & dataloaders
├── train.py            # Training entry point
├── utils.py            # Training utilities
└── tests/              # Architecture & parity tests

Configuration

  • Explicit configs over implicit defaults
  • Modular overrides:
    • dataset
    • model
    • optimizer
    • lr_scheduler
  • Experiment outputs are auto-versioned and logged.

Example override:

python train.py model=ViT-B-16 dataset=cifar100

About

From scratch repro of ViT, DeiT and Swin Transformers

Resources

Stars

Watchers

Forks

Languages