Skip to content

Deep learning prediction of protein complex structure quality

License

Notifications You must be signed in to change notification settings

biocad/DProQ

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DProQ: A Gated-Graph Transformer for Protein Complex Structure Assessment

DProQ, is a Gated-Graph Transformer model for end-to-end protein complex structure's quality evaluation. DProQ achieves significant speed-ups and better quality compared to current baseline method. If you have any questions or suggestions, please contact us by [email protected] . We are happy to help!

pipeline.png

gated_graph_transformer.png

Citation

If you think our work is helpful, please cite our work by:

@article {Chen2022.05.19.492741,
    author = {Chen, Xiao and Morehead, Alex and Liu, Jian and Cheng, Jianlin},
    title = {DProQ: A Gated-Graph Transformer for Protein Complex Structure Assessment},
    elocation-id = {2022.05.19.492741},
    year = {2022},
    doi = {10.1101/2022.05.19.492741},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2022/05/20/2022.05.19.492741},
    eprint = {https://www.biorxiv.org/content/early/2022/05/20/2022.05.19.492741.full.pdf},
    journal = {bioRxiv}
}

Dataset

Benchmark sets

We provide our benchmark tests HAF2 and DBM55-AF2 for download by:

wget https://zenodo.org/record/6569837/files/DproQ_benchmark.tgz

Each dataset contains:

  1. decoy folder: decoys files
  2. native folder: native structure files
  3. label_info.csv: DockQ scores and CAPRI class label

Installation

  1. Download this repository

    git clone https://github.com/BioinfoMachineLearning/DProQ.git
  2. Set up conda environment locally

    cd DProQ
    conda env create --name DProQ -f environment.yml
  3. Activate conda environment

    conda activate DPRoQ

Usage

Here is the inference.py script parameters' introduction.

python inference.py
-c --complex_folder     Raw protien complex complex_folder
-w --work_dir           Working directory to save all intermedia files and folders, it will created if it is not exits
-r --result_folder      Result folder to save two ranking results, it will created if it is not exits
-r --threads            Number of threads for parallel feature generation and dataloader, default=10
-s --delete_tmp         Set False to save work_dir and intermedia files, otherwise set True, default=False

Use provided model weights to predict protein complex structures' quality

DProQ requires GPU. We provide few protein complexes in example folder for test. The evaluation result Ranking.csv is stored in result_folder.

python ./inference.py -c ./examples/6AL0/ -w ./examples/work/ -r ./examples/result

You can build you onw dataset for evaluation, the data folder should look like:

customer_data_folder
├── decoy_1.pdb
├── decoy_2.pdb
├── decoy_3.pdb
├── decoy_4.pdb
└── decoy_5.pdb

Main results

Following four tables show DProQ's consistent best result on HAF2 and DBM55-AF2 test sets in terms of hit rate and ranking loss. The best result is highlighted on bold.

HAF2 test set

Table 1: Hit rate performance on the HAF2 dataset. The BEST column represents each target’s best-possible Top-10 result. The SUMMARY row lists the results when all targets are taken into consideration.

ID DPROQ DPROQ_GT DPROQ_GTE DPROQ_GTN GNN_DOVE BEST
7AOH 10/10/10 10/10/10 10/10/10 10/10/10 9/9/0 10/10/10
7D7F 0/0/0 0/0/0 0/0/0 0/0/0 0/0/0 5/0/0
7AMV 10/10/10 10/10/10 10/10/10 10/10/10 10/10/6 10/10/10
7OEL 10/10/0 10/9/0 10/10/0 10/10/0 10/10/0 10/10/0
7O28 10/10/0 10/10/0 10/10/0 10/10/0 10/10/0 10/10/0
7ALA 0/0/0 0/0/0 0/0/0 0/0/0 0/0/0 1/0/0
7MRW 5/4/0 0/0/0 0/0/0 0/0/0 0/0/0 10/10/0
7OZN 0/0/0 0/0/0 0/0/0 0/0/0 0/0/0 10/2/0
7D3Y 2/0/0 5/0/0 6/0/0 8/0/0 0/0/0 10/0/0
7NKZ 10/10/2 10/10/1 10/10/1 10/010/4 10/9/9 10/10/10
7LXT 1/1/0 0/0/0 0/0/0 0/0/0 1/0/0 10/10/0
7KBR 10/10/10 10/10/10 10/10/10 10/10/10 10/10/9 10/10/10
7O27 10/10/0 10/10/0 10/10/0 10/10/0 10/4/0 10/10/0
SUMMARY 10/9/4 8/7/4 8/7/4 8/7/4 8/7/3 13/10/4

Table 2: Ranking loss performance on the HAF2 dataset. The BEST row represents the mean and standard deviation of the ranking losses for all targets.

Target DPROQ DProQ_GT DPROQ_GTE DPROQ_GTN GNN_DOVE
7AOH 0.066 0.026 0.026 0.058 0.928
7D7F 0.471 0.471 0.47 0.471 0.003
7AMV 0.01 0.021 0.017 0.019 0.342
7OEL 0.062 0.063 0.135 0.135 0.21
7O28 0.029 0.021 0.027 0.034 0.244
7ALA 0.232 0.226 0.226 0.226 0.226
7MRW 0.085 0.603 0.555 0.555 0.598
7OZN 0.409 0.409 0.49 0.281 0.457
7D3Y 0.326 0.33 0.012 0.326 0.295
7NKZ 0.164 0.175 0.175 0.164 0.459
7LXT 0.586 0.586 0.586 0.586 0.295
7KBR 0.068 0.152 0.152 0.17 0.068
7O27 0.03 0.079 0.079 0.079 0.334
BEST 0.195 ± 0.185 0.243 ± 0.206 0.227 ± 0.21 0.239 ± 0.187 0.343 ± 0.228

DBM55-AF2 test set

Table 3: Hit rate performance on DBM55-AF2 dataset. The BEST column represents each target’s best-possible Top-10 result. The SUMMARY row lists the results when all targets are taken into consideration.

Target DPROQ DPROQ_GT DPROQ_GTE DPROQ_GTN GNN_DOVE BEST
6AL0 9/2/0 10/0/0 10/0/0 10/2/0 6/0/0 10/2/0
3SE8 8/8/0 9/9/0 8/8/0 8/8/0 3/0/0 10/10/0
5GRJ 10/10/0 9/9/0 10/10/0 9/9/0 3/2/0 10/10/0
6A77 7/7/0 7/7/0 8/8/0 8/8/0 0/0/0 8/8/0
4M5Z 10/10/1 10/10/0 10/10/0 10/10/0 10/10/0 10/10/1
4ETQ 1/1/0 1/1/0 1/1/0 1/1/0 0/0/0 1/1/0
5CBA 10/10/1 10/10/0 10/10/0 10/10/1 10/10/3 10/10/6
5WK3 0/0/0 0/0/0 0/0/0 0/0/0 1/0/0 3/0/0
5Y9J 4/0/0 6/0/0 5/0/0 4/0/0 0/0/0 8/0/0
6BOS 10/10/0 10/10/0 10/10/0 10/10/0 10/10/0 10/10/0
5HGG 8/0/0 8/0/0 8/0/0 8/0/0 8/0/0 10/0/0
6A0Z 0/0/0 0/0/0 0/0/0 0/0/0 2/0/0 3/0/0
3U7Y 2/2/1 2/2/1 2/2/1 2/1/0 2/2/1 2/2/1
3WD5 10/8/0 9/8/0 9/8/0 9/8/0 0/0/0 10/10/0
5KOV 0/0/0 0/0/0 0/0/0 0/0/0 1/0/0 2/0/0
SUMMARY 12/10/3 12/9/1 12/9/1 12/10/1 10/4/1 15/10/3

Table 4: Ranking loss performance on the DBM55-AF2 dataset. The BEST row represents the mean and standard deviation of the ranking losses for all targets.

Target DPROQ DPROQ_GT DPROQ_GTE DPROQ_GTN GNN_DOVE
6AL0 0.0 0.156 0.156 0.0 0.424
3SE8 0.079 0.041 0.041 0.079 0.735
5GRJ 0.024 0.012 0.095 0.012 0.776
6A77 0.037 0.062 0.0 0.037 0.591
4M5Z 0.015 0.026 0.026 0.015 0.221
4ETQ 0.0 0.76 0.0 0.748 0.759
5CBA 0.052 0.038 0.052 0.058 0.019
5WK3 0.114 0.114 0.114 0.186 0.087
5Y9J 0.0 0.0 0.0 0.0 0.382
6BOS 0.081 0.081 0.0 0.0 0.081
5HGG 0.051 0.051 0.121 0.051 0.121
6A0Z 0.207 0.207 0.207 0.207 0.062
3U7Y 0.0 0.021 0.0 0.0 0.756
3WD5 0.011 0.011 0.011 0.0 0.672
5KOV 0.065 0.08 0.085 0.087 0.0
BEST 0.049 ± 0.054 0.111 ± 0.182 0.061 ± 0.064 0.099 ± 0.185 0.379 ± 0.298

About

Deep learning prediction of protein complex structure quality

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%