Skip to content

erow/FastSSL

Repository files navigation

Toward Training Self-supervised Models with Limited Budget

This repository focuses on enabling efficient training for self-supervised learning (SSL). Often referred to as the "dark matter" of intelligence, SSL empowers AI systems to learn without supervision, drawing insights from their environments in ways reminiscent of human learning. While numerous advanced SSL algorithms have been proposed, many achieving state-of-the-art (SOTA) results, their adoption is often hindered by prohibitively high training costs. This limitation stifles innovation from academia and individual researchers. Designed to be beginner-friendly, this repository allows users to reproduce SSL algorithms and perform fast validation for new ideas. Here are key features:

  • Efficient data loading with ffcv.
  • Flexible configuration with gin-config.
  • A collection of SSL algorithms.
  • Evaluation with vitookit.
  • All models are available at WANDB.
  • A guideline of training SSL models on CIFAR10 in a few minutes!.

Environment Setup

Create a new environment with conda or micromamba:

conda create -y -n FastSSL python=3.10 cupy pkg-config 'libjpeg-turbo=3.0.0' opencv numba  pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -c conda-forge 
conda activate FastSSL
pip install -r requirements.txt

Or, you can use a docker image to ensure everything is the same with mine from Github Package

docker pull ghcr.io/erow/aisurrey-docker:sha256-d835a01e444257345d78c95cec157eb604a73935f70f9e7928cdd08d97411fa7.sig

Usage

torchrun

To train a MAE, you can run the following command

torchrun --nproc_per_node 8 main_pretrain.py  --data_path=${train_path} --data_set=ffcv \
    --epochs 800 --warmup_epochs 40 --blr 1.5e-4 --weight_decay 0.05 --batch_size 512\
    --cfgs configs/mae_ffcv.gin --gin build_model.model_fn=@base/MaskedAutoencoderViT build_dataset.transform_fn=@SimplePipeline  --ckpt_freq=100 --output_dir outputs/IN1K_base 

Optional arguments: --compile to compile the model, --ckpt_freq to save checkpoints every ckpt_freq epochs, --online_prob to evaluate the linear classifier during training.

HPC

The original settings for ViT-Large are bs=4096, epochs=800 ~42h in 64 V100 GPUs.

WANDB_NAME=mae_1k python submitit_pretrain.py \
    --job_dir ${JOB_DIR} \
    -p gpu --ngpus 8 --nodes 8 \
    --batch_size 64 \
    --epochs 800 \
    --warmup_epochs 40 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --cfgs configs/mae_ffcv.gin --gin build_model.model_fn=@base/MaskedAutoencoderViT build_dataset.transform_fn=@SimplePipeline \
    --data_path=${train_path} --data_set=ffcv 

Cite Me!

@misc{wu2024dailymaepretrainingmaskedautoencoders,
      title={DailyMAE: Towards Pretraining Masked Autoencoders in One Day}, 
      author={Jiantao Wu and Shentong Mo and Sara Atito and Zhenhua Feng and Josef Kittler and Muhammad Awais},
      year={2024},
      eprint={2404.00509},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2404.00509}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published