Energy Graph Neural Networks (EGNN)

This project contains the code and training/test data used for training and testing Energy Graph Neural Networks (EGNN). The code is a modified version of the Graph Neural Network code by masashitsubaki which is available here under the Apache license.

Requirements

CANDOCK

A copy of the CANDOCK docking program is provided in the CANDOCK submodule directory. To build CANDOCK, you will need a modern C++ compiler, the Boost Library, and CMake. Other requirements, such as GSL and OpenMM will be installed during the CMake build process. Note that CANDOCK is only required for reproducing the docking results shown in the associated paper. It is not required for running the associated Python scripts.

EGNN

PyTorch
scikit-learn
RDKit

Usage

Obtain the code

git clone https://github.com/chopralab/egnn.git
cd egnn
git submodule update --init --recursive

The above commands will download both the EGNN Python scripts required for training a model and the C++ docking code for CANDOCK v0.6.0.

Building CANDOCK

After doing the above, do the following where N is the number of processor cores on your machine:

cd candock
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j N

Note that the above commands are tested on Ubuntu 16.04 and 18.04.

Running the EGNN code

All the files in the input are prepared for the compounds discussed in the paper. Each line of the text file corresponds to a compound used in the training (data.txt) and test sets (test.txt). THe first column is the SMILES string for the molecule, the next 2 columns are the best two scores out of 96 scores obtained from docking the compound with CANDOCK. Best scoring functions were determined using Kappa statistics. In the data.txt file, there is an additional column to denote if the compound is active or inactive (with a 0 or 1, respectively).

After creating/editing the above two files, you must run the preprocess_train_data.sh script. If any changes to the compounds is made in either data.txt or test.txt, the script must be run again. You must supply the name of the dataset and the radius used for fingerprinting. By default, the dataset will be named PDL1 and the radius will be 2. This script will populate the folders train and test with the required data for running the model.

Once the training and test sets have been prepared, you can run the train_full.sh script to train the model where the weights will be placed in the fullmodel directory. You can edit the hyperparameters in the train_full.sh and you must ensure that the radius parameter matches that of the one used during preparation.

After the model finishes training, the run_test.sh script can be run, which will predict the activity of the compounds in the test.txt file. Note that hyperparameters must be the same as when the model was trained to ensure that correct weights are loaded. This script will automatically bootstrap the results and place the final results in the bootstrapping_results directory. The final results can be combined using the combining_smiles_bootstrapping_average_countsover0.5.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bootstrapping_results		bootstrapping_results
bootstrapping_results_test		bootstrapping_results_test
bootstrapping_results_test_summaries		bootstrapping_results_test_summaries
candock @ 7362b83		candock @ 7362b83
fullmodel		fullmodel
input		input
synthetic_test/radius2		synthetic_test/radius2
test		test
train		train
.gitmodules		.gitmodules
AUC_precion_recall_f1-score.py		AUC_precion_recall_f1-score.py
AUC_precion_recall_f1-score_for_all_models.py		AUC_precion_recall_f1-score_for_all_models.py
Figure1.svg		Figure1.svg
Final_bootstrapping_results_with_counts.csv		Final_bootstrapping_results_with_counts.csv
Final_bootstrapping_results_with_counts_sorted.xlsx		Final_bootstrapping_results_with_counts_sorted.xlsx
README.md		README.md
all_bootstrapped_models_performances.csv		all_bootstrapped_models_performances.csv
average_performances.csv		average_performances.csv
calculating_predicted_labels.py		calculating_predicted_labels.py
combining_smiles_bootstrapping_average_countsover0.5.py		combining_smiles_bootstrapping_average_countsover0.5.py
model_evaluation.txt		model_evaluation.txt
preprocess_data.sh		preprocess_data.sh
preprocess_data_modifying.py		preprocess_data_modifying.py
results.txt		results.txt
run_test.py		run_test.py
run_test.sh		run_test.sh
run_test_for_hold_out_test.py		run_test_for_hold_out_test.py
run_test_for_synthetic_test.py		run_test_for_synthetic_test.py
slurm-1231609.out		slurm-1231609.out
take_counts.py		take_counts.py
train_full.py		train_full.py
train_full.sh		train_full.sh
train_full_bootstrapping.py		train_full_bootstrapping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Energy Graph Neural Networks (EGNN)

Requirements

CANDOCK

EGNN

Usage

Obtain the code

Building CANDOCK

Running the EGNN code

About

Releases

Packages

Contributors 2

Languages

chopralab/egnn

Folders and files

Latest commit

History

Repository files navigation

Energy Graph Neural Networks (EGNN)

Requirements

CANDOCK

EGNN

Usage

Obtain the code

Building CANDOCK

Running the EGNN code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages