Toward Training Self-supervised Models with Limited Budget

This repository focuses on enabling efficient training for self-supervised learning (SSL). Often referred to as the "dark matter" of intelligence, SSL empowers AI systems to learn without supervision, drawing insights from their environments in ways reminiscent of human learning. While numerous advanced SSL algorithms have been proposed, many achieving state-of-the-art (SOTA) results, their adoption is often hindered by prohibitively high training costs. This limitation stifles innovation from academia and individual researchers. Designed to be beginner-friendly, this repository allows users to reproduce SSL algorithms and perform fast validation for new ideas. Here are key features:

Efficient data loading with ffcv.
Flexible configuration with gin-config.
A collection of SSL algorithms.
Evaluation with vitookit.
All models are available at WANDB.
A guideline of training SSL models on CIFAR10 in a few minutes!.

Environment Setup

Create a new environment with conda or micromamba:

conda create -y -n FastSSL python=3.10 cupy pkg-config 'libjpeg-turbo=3.0.0' opencv numba  pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -c conda-forge 
conda activate FastSSL
pip install -r requirements.txt

Or, you can use a docker image to ensure everything is the same with mine from Github Package

docker pull ghcr.io/erow/aisurrey-docker:sha256-d835a01e444257345d78c95cec157eb604a73935f70f9e7928cdd08d97411fa7.sig

Usage

torchrun

To train a MAE, you can run the following command

torchrun --nproc_per_node 8 main_pretrain.py  --data_path=${train_path} --data_set=ffcv \
    --epochs 800 --warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05 --batch_size 512\
    --cfgs configs/mae_ffcv.gin --gin build_model.model_fn=@base/MaskedAutoencoderViT build_dataset.transform_fn=@SimplePipeline  --ckpt_freq=100 --output_dir outputs/IN1K_base

Optional arguments: --compile to compile the model, --ckpt_freq to save checkpoints every ckpt_freq epochs, --online_prob to evaluate the linear classifier during training.

HPC

The original settings for ViT-Large are bs=4096, epochs=800 ~42h in 64 V100 GPUs.

WANDB_NAME=mae_1k python submitit_pretrain.py \
    --job_dir ${JOB_DIR} \
    -p gpu --ngpus 8 --nodes 8 \
    --batch_size 64 \
    --epochs 800 \
    --warmup_epochs 40 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --cfgs configs/mae_ffcv.gin --gin build_model.model_fn=@base/MaskedAutoencoderViT build_dataset.transform_fn=@SimplePipeline \
    --data_path=${train_path} --data_set=ffcv

Cite Me!

@misc{wu2024dailymaepretrainingmaskedautoencoders,
      title={DailyMAE: Towards Pretraining Masked Autoencoders in One Day}, 
      author={Jiantao Wu and Shentong Mo and Sara Atito and Zhenhua Feng and Josef Kittler and Muhammad Awais},
      year={2024},
      eprint={2404.00509},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2404.00509}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
bin		bin
configs		configs
dataset		dataset
docs		docs
layers		layers
model		model
pics		pics
util		util
.gitignore		.gitignore
README.md		README.md
environment.txt		environment.txt
main_pretrain.py		main_pretrain.py
main_pretrain_ema.py		main_pretrain_ema.py
profiler.py		profiler.py
requirements.txt		requirements.txt
submitit_pretrain.py		submitit_pretrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Toward Training Self-supervised Models with Limited Budget

Environment Setup

Usage

torchrun

HPC

Cite Me!

About

Uh oh!

Releases

Packages

Uh oh!

Languages

erow/FastSSL

Folders and files

Latest commit

History

Repository files navigation

Toward Training Self-supervised Models with Limited Budget

Environment Setup

Usage

torchrun

HPC

Cite Me!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages