Preprint: "Data-centric training enables meaningful interaction learning in protein–ligand binding affinity prediction." ChemRXiv.
Tip
Always use a virtual environment to manage dependencies.
python -m venv .venv
source .venv/bin/activate
Quick setup for inference. Install the package directly from PyPI:
pip install docktdeep
Predict binding affinities for protein-ligand pairs (predictions are given in kcal/mol).
# single protein-ligand pair
docktdeep predict --proteins protein.pdb --ligands ligand.pdb --output-csv results.csv
# multiple pairs
docktdeep predict \
--proteins protein1.pdb protein2.pdb \
--ligands ligand1.pdb ligand2.pdb \
--output-csv results.csv \
--max-batch-size 16
# options available in help
docktdeep predict --help
Tip
Use shell globbing patterns to process multiple files efficiently.
# using regex expansion
docktdeep predict \
--proteins $(ls path/to/proteins/*_protein.pdb) \
--ligands $(ls path/to/ligands/*_ligand.pdb)
# another example using find command for more complex patterns
docktdeep predict \
--proteins $(find /data/complexes -name "*_protein_prep.pdb" | sort) \
--ligands $(find /data/complexes -name "*_ligand_rnum.pdb" | sort)
For development and training custom models:
# clone the repository
git clone https://github.com/gmmsb-lncc/docktdeep.git
cd docktdeep
# create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# install deps
python -m pip install -r requirements.txt
# run tests to verify installation
python -m pytest tests/
Initialize a new aim repository for tracking experiments:
aim init
# to start the aim server
aim server
To see all available training options:
python train.py --help
Train a model with optimized hyperparameters:
python train.py \
--model Baseline \
--experiment experiment-name \
--depthwise-convs \
--adaptive-pooling \
--optim AdamW \
--max-epochs 1500 \
--batch-size 64 \
--lr 0.00087469 \
--beta1 0.25693012 \
--eps 0.00032933 \
--dropout 0.25348994 \
--wdecay 0.0000169 \
--molecular-dropout 0.06 \
--molecular-dropout-unit complex \
--random-rotation \
--dataframe-path path/to/dataframe.csv \
--root-dir path/to/data/PDBbind2020 \
--ligand-path-pattern "{c}/{c}_ligand_rnum.pdb" \
--protein-path-pattern "{c}/{c}_protein_prep.pdb" \
--split-column random_split
If you use DockTDeep in your research, please cite:
@article{dasilva2025docktdeep,
title={Data-centric training enables meaningful interaction learning in protein--ligand binding affinity prediction},
author={da Silva, Matheus M. P. and Vidal, Lincon and Guedes, Isabella and de Magalh{\~a}es, Camila and Cust{\'o}dio, F{\'a}bio and Dardenne, Laurent},
year={2025}
}
- DockTGrid: a python package for generating deep learning-ready voxel grids of molecular complexes. GitHub.