LongViT is a vision Transformer that can process gigapixel images (e.g., 32,768x32,768 images) in an end-to-end manner. We split the image into millions of patches and employ LongNet to directly model the extremely long sequence. We apply LongViT in the field of computational pathology and achieve remarkable performance on cancer subtyping and survival prediction tasks.
pip install -r requirements.txt
pip install git+https://github.com/shumingma/fairseq.git@moe
pip install -v -U git+https://github.com/facebookresearch/[email protected]#egg=xformers
We perform self-supervised pretraining on TCGA diagnostic slides using DINO objective. The detailed instructions can be found at get_started_for_tcga_pretraining.md
.
The link to the pretrained LongViT model on TCGA diagnostic slides:
LongViT
: #layer=12; hidden=384; FFN factor=4x; #head=16; patch=32x32
We perform finetuning on cancer subtyping on images with sizes up to 32,768x32,768 (1M patches). The detailed instructions can be found at get_started_for_tcga_subtyping.md
.
We perform finetuning on survival prediction on images with sizes up to 32,768x32,768 (1M patches). The detailed instructions can be found at get_started_for_tcga_survival_prediction.md
.
If you find this repository useful, please consider citing our work:
@article{longvit,
title={When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology},
author={Wang, Wenhui and Ma, Shuming and Xu, Hanwen and Usuyama, Naoto and Ding, Jiayu and Poon, Hoifung and Wei, Furu},
journal={arXiv preprint arXiv:2312.03558},
year={2023}
}
@article{longnet,
title={LongNet: Scaling transformers to 1,000,000,000 tokens},
author={Ding, Jiayu and Ma, Shuming and Dong, Li and Zhang, Xingxing and Huang, Shaohan and Wang, Wenhui and Zheng, Nanning and Wei, Furu},
journal={arXiv preprint arXiv:2307.02486},
year={2023}
}
@article{torchscale,
title={TorchScale: Transformers at scale},
author={Ma, Shuming and Wang, Hongyu and Huang, Shaohan and Wang, Wenhui and Chi, Zewen and Dong, Li and Benhaim, Alon and Patra, Barun and Chaudhary, Vishrav and Song, Xia and others},
journal={arXiv preprint arXiv:2211.13184},
year={2022}
}
This repository is built using the BEiT-3, the MCAT, the DINO, the HIPT repository and the timm library.
This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
Microsoft Open Source Code of Conduct
For help or issues using LongViT models, please submit a GitHub issue.