EquiCPI

A PyTorch implementation of: EquiCPI: SE(3)-Equivariant Geometric Deep Learning for Structure-Aware Prediction of Compound-Protein Interactions

🔬 Overview

EquiCPI is a novel model designed to leverage the full SE(3) Euclidean group by incorporating multiple e3nn neural networks to predict binding affinity free energy. These networks apply principles of equivariance and invariance to process 3D molecular structures, ensuring robustness against transformations such as rotations, translations, and reflections. Here we used the predicted 3D structure of compounds by adopting Diffdock-L and the predicted 3D fold of protein sequence by using ESMFold. Traditional sequence-based models for compound-protein interaction (CPI) prediction often rely on molecular fingerprints, descriptors, or graph representations. These approaches tend to overlook the significant information of three-dimensional (3D) structures. To address this limitation, we developed a novel model, EquiCPI, based on Euclidean neural networks (e3nns), which leverage the SE(3) (Euclidean group) group to predict binding affinity. The model leverages principles of equivariance and invariance, enabling it to extract 3D information while maintaining consistency across transformations such as rotations, translations, and reflections. We utilized predicted 3D structures from sequence data of compounds from state-of-the-art DiffDock-L and 3D protein folds from ESMFold to train and validate the proposed model.

To achieve this, we use:

DiffDock-L for predicting the 3D structures of compounds.
ESMFold for predicting protein 3D folds from sequences.

🚀 Key Advantages Over Traditional Models

Traditional sequence-based CPI prediction models rely on molecular fingerprints, descriptors, or graphs, often overlooking critical 3D structural information. EquiCPI, built on Euclidean neural networks (e3nn), fully utilizes the SE(3) group to process 3D structures, providing more accurate and structure-aware CPI predictions.

⚙️ Setup & Installation

Prerequisites

Python 3.9
PyTorch 2.1.2 + CUDA 11.8

Installation

# Clone the repository
git clone https://github.com/dmis-lab/EquiCPI.git

# Create and activate the environment
conda env create -f environment.yml

🔄 Data Preprocessing & Graph Generation

🏗️ Generating a 3D Graph for a Protein from a .pdb File

To convert a .pdb file into a .pt file containing a 3D protein graph:

python generate_graph_for_protein.py #output_ESM #file_protein_name.csv #processed_dir #name_of_file.pt

📊 Re-ranking Complexes with Vina Docking Score

Our workflow starts with:

SMILES strings representing compounds.
Amino acid sequences defining proteins.
DiffDock-L & ESMFold generating 3D structures of compounds and proteins.
AutoDock Vina predicting binding affinities and identifying optimal docking poses.

To re-rank predicted complexes based on Vina docking scores, run:

python ./vina_score/vina_function_rerank_regu.py #prediction_output_diffdock #dataset.csv(with compound.sdf, protein.pdb)

Note: dataset.csv must contain compound.sdf and protein.pdb files.

🏗️ Converting Compounds & Proteins into 3D Graph Representations

python generate_pt_dataset.py #machine_learning_task #data_name #data_csv_file.csv

🎯 Training & Evaluation

🏋️ Training the Model

bash run_class.sh

📂 Datasets

We utilize several datasets for training and evaluation:

📌 Dependencies & Related Work

EquiCPI builds upon the source code and data from the following projects:

DiffDock-L – Deep confident steps to new pockets.
ESMFold – Atomic-level protein structure prediction.
AutoDock-Vina – Molecular docking software.

We sincerely appreciate all contributors and maintainers for their efforts! 🙌

📜 License

This repository follows the license terms of the EquiCPI project. ## License MIT.

✅ TO DO

Task	Status	Notes
Improve CLI usability	✅ Done	Add YAML/argparse defaults
Add structured W&B logging	✅ Done	Already integrated
Clean folder structure	✅ Done	Organize into `scripts/`, `models/`, `data/`
Add Jupyter notebooks	🔧 In Progress	Demo for training, testing, visualizing graphs
Detailed API documentation	🔧 In Progress	Add docstrings + auto doc
How to train your dataset	🔧 In Progress	supporting custom datasets

📖 Citation

If you use this code or dataset in your research, please cite:

@misc{nguyen2025equicpise3equivariantgeometricdeep,
      title={EquiCPI: SE(3)-Equivariant Geometric Deep Learning for Structure-Aware Prediction of Compound-Protein Interactions},
      author={Ngoc-Quang Nguyen},
      year={2025},
      eprint={2504.04654},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.04654},
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
common		common
vina_score		vina_score
README.md		README.md
dataset.py		dataset.py
environment.yml		environment.yml
generate_graph_for_protein.py		generate_graph_for_protein.py
generate_pt_dataset.py		generate_pt_dataset.py
model.py		model.py
model_gcn.py		model_gcn.py
run_class.sh		run_class.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EquiCPI

🔬 Overview

🚀 Key Advantages Over Traditional Models

⚙️ Setup & Installation

Prerequisites

Installation

🔄 Data Preprocessing & Graph Generation

🏗️ Generating a 3D Graph for a Protein from a .pdb File

📊 Re-ranking Complexes with Vina Docking Score

🏗️ Converting Compounds & Proteins into 3D Graph Representations

🎯 Training & Evaluation

🏋️ Training the Model

📂 Datasets

📌 Dependencies & Related Work

📜 License

✅ TO DO

📖 Citation

About

Uh oh!

Releases

Packages

Languages

dmis-lab/EquiCPI

Folders and files

Latest commit

History

Repository files navigation

EquiCPI

🔬 Overview

🚀 Key Advantages Over Traditional Models

⚙️ Setup & Installation

Prerequisites

Installation

🔄 Data Preprocessing & Graph Generation

🏗️ Generating a 3D Graph for a Protein from a .pdb File

📊 Re-ranking Complexes with Vina Docking Score

🏗️ Converting Compounds & Proteins into 3D Graph Representations

🎯 Training & Evaluation

🏋️ Training the Model

📂 Datasets

📌 Dependencies & Related Work

📜 License

✅ TO DO

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages