MoH: Multi-Head Attention as Mixture-of-Head Attention

If you like our project, please give us a star ⭐ on GitHub for the latest update.

ImageNet-1K classification with MoH-ViT

💡 Download URL

Code	HuggingFace Model
MoH-ViT	🤗 MoH-ViT-B-75, MoH-ViT-B-50, MoH-ViT-S-80, MoH-ViT-S-75
MoH-DiT	😊 MoH-DiT-90
MoH-LLaMA3-8B	😊 MoH-LLaMA3-8B

🛠️ Requirements and Installation

Requirements

pip install -r requirements.txt

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Evaluation

To evaluate the pre-trained MoH-ViT on ImageNet-1K val with GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=2024 \
--use_env main.py \
--config ./configs/${MODEL_TYPE}.py \
--data-path ${ImageNet-1K_PATH} \
--resume ./checkpoints/${MODEL_TYPE}.pth \
--eval

ImageNet-1K Training

To train MoH-ViT on ImageNet-1K using 8 GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=2024 \
--use_env main.py \
--config ./configs/${MODEL_TYPE}.py \
--data-path ${ImageNet-1K_PATH} \
--batch-size 128 \
--output_dir results/${MODEL_TYPE} \
--num_workers 32

or

bash moh_transnext_base_75.sh ${ImageNet-1K_PATH}
bash moh_transnext_base_50.sh ${ImageNet-1K_PATH}
bash moh_transnext_small_80.sh ${ImageNet-1K_PATH}
bash moh_transnext_small_75.sh ${ImageNet-1K_PATH}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MoH: Multi-Head Attention as Mixture-of-Head Attention

If you like our project, please give us a star ⭐ on GitHub for the latest update.

ImageNet-1K classification with MoH-ViT

💡 Download URL

🛠️ Requirements and Installation

Requirements

Data preparation

Evaluation

ImageNet-1K Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

MoH: Multi-Head Attention as Mixture-of-Head Attention

If you like our project, please give us a star ⭐ on GitHub for the latest update.

ImageNet-1K classification with MoH-ViT

💡 Download URL

🛠️ Requirements and Installation

Requirements

Data preparation

Evaluation

ImageNet-1K Training