AIRFold, built on the foundation of AlphaFold2, aims to provide scalable, systematic solutions for the critical issue of protein structure prediction in the field of life sciences. AIRFold's unique Homology Miner module focuses on the mining and extraction of co-evolutionary information, intelligently and automatically extracting, analyzing, and processing the co-evolution information within protein homologous sequences (MSA). In addition, AIRFold offers a systematic structural prediction solution, integrating various leading structural prediction models such as AlphaFold2, RoseTTAFold2, single-sequence structure models like OmegaFold and ESMFold, and ultimately ranking and screening all predicted structures using a model quality estimation (MQE) module. To fully integrate these different modules, we provide a microservices architecture along with user-friendly APIs and a web-based graphical interface, making it convenient for developers and biochemical researchers to use our platform for structural prediction.
AIRFold adopts a microservices architecture and uses Docker to manage all modules and their respective runtime environments, making deployment and startup of AIRFold very simple. Additionally, AIRFold provides an easy-to-use web interface and API, allowing users to submit and manage structure prediction requests conveniently. Follow the steps provided in the documentation below to quickly deploy AIRFold and start your structure prediction journey.
AIRFold Framework | AIRFold Web Interface |
---|---|
Please follow these steps:
-
Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.
-
Clone AIRFold repository:
git clone https://github.com/health-air/AIRFold cd ./AIRFold
-
Download and prepare the databases, see Databases section.
-
Launch AIRFold via one line command:
docker compose up
-
Interact with AIRFold via Web page or RESTful API:
- Submit page: http://127.0.0.1
- API document page: http://127.0.0.1:8081/docs
- Tasks monitor page: http://127.0.0.1:5555
Note: please change IP address and ports accordingly, they are specified in docker-compose.yml
AIRFold searches for MSA and templates from various databases. The following lists all the databases used.
Genomics and metagenomics sequence databases
- BFD,
- MGnify,
- UniRef90,
- NR database for BLAST,
- Genomics and metagenomics sequence databases for DeepMSA2,
- ColabFold dataset for MMseqs2,
- Small BFD sequence database
- Uniprot sequence database
Structure databases
Data structure
├── model_params (models and parameters for AlphaFold2, RoseTTAFold2, ect.)
├── bfd
├── blast_dbs
├── JGIclust
├── metaclust
├── mgnify
├── pdb70
├── pdb_mmcif
├── small_bfd
├── uniclust30
├── uniref30
└── uniref90
MSA-based structure prediction
Single sequence-based structure prediction
Multiple sequence alignment generation
Multiple sequence alignment selection
Protein model quality assessment
In addition to the web interface and API, AIRFold also provides convenient scripts for the following four functions: multiple sequence alignment generation, pretrained embedding generation, protein contact map prediction, and protein structure prediction.
-
Input: Protein sequences in fasta format.
-
Output: Multiple sequence alignment results in a3m format.
python ./scripts/run_mode.py --input_path example.fasta --mode msa
-
Input: Protein sequences in fasta format.
-
Output: Generated sequence embeddings in pickle format.
python ./scripts/run_mode.py --input_path example.fasta --mode feature
-
Input: Protein sequences in fasta format.
-
Output: Generated contact map in pickle format.
python ./scripts/run_mode.py --input_path example.fasta --mode disgram
-
Input: Protein sequences in fasta format.
-
Output: Protein structure in pdb format.
python ./scripts/run_mode.py --input_path example.fasta --mode pipline
If you find our open-sourced code & models helpful to your research, please also consider star🌟 and cite📑 this repo. Thank you for your support!
@misc{airfold,
author={Xin, Hong and Hongliang, Li and Jingjing, Gong and Yuxuan, Song, and Yinjun, Jia and Keyue, Qiu and Han, Tang and Haichuan, Tan and Yanyan, Lan},
title={AIRFold},
year={2024},
howpublished = {\url{https://github.com/health-air/AIRFold}}
}
Please also reference the third-party tools (listed above) you use.
We gratefully acknowledge the financial support provided by the National Key R&D Program of China under the grant No.2021YFF1201600. This support has been crucial in enabling our research and development activities.
Copyright 2024 AIR.
AIRFold is extended from AlphaFold, and is licensed under the permissive Apache Licence, Version 2.0.
If you encounter problems using AIRFold, feel free to create an issue! We also welcome pull requests from the community.
For help or issues using the repos, please submit a GitHub issue.
For other communications, please contact Yanyan Lan ([email protected]).