This pipeline processes .vtk-based structural biology datasets, trains a multi-stage graph neural network (GNN) model, and performs test-time inference to generate predictions suitable for submission.
The pipeline is structured in modular stages:
- Installation: Setup environment and dependencies.
- Data Preparation: Download and extract
.tar.gzfiles containing.vtkfiles. - Stage 1 - Confusion Analysis: Initial training with error diagnosis.
- Stage 2 - Pretraining: Retraining using relabeled examples.
- Stage 3 - Finetuning: Final training stage using pretrained weights.
- Inference (Submission): Run inference on test data with fully trained weights.
macOS / Linux
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtWindows
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtDownload example .vtk datasets using the provided helper script. On Windows, use Git Bash to execute it:
sh scripts/download_data.shBy default:
DATA_PATH = ./sample_dataOUTPUT_PATH = ./output
To run the full multi-stage pipeline:
sh scripts/pipeline.shFor experiments on the full dataset, replace ./sample_data with your full dataset path (e.g., ./data).
If you have trained models already (e.g., trained on the full dataset), you can skip training and run inference directly:
sh scripts/inference.shThis uses pretrained weights located in ./weights/, including:
local_gcn_weights_stage_3.pthglobal_gcn_weights_stage_3.pth
These are expected to be trained on the entire training dataset, not the sample. The script:
- Preprocesses the test data.
- Runs inference using the final GNN.
- Saves predictions to
output/submission.csv.
The final model was evaluated on the held-out test set, achieving the following results:
| Metric | Score |
|---|---|
| Accuracy | 0.71090 |
| Balanced Accuracy | 0.40483 |
| F1-Score | 0.66623 |
| Recall | 0.71090 |
| Precision | 0.69556 |
@article{Yacoub2025,
title = {Shrec 2025: Protein Surface Shape Retrieval Including Electrostatic Potential},
url = {http://dx.doi.org/10.2139/ssrn.5258950},
DOI = {10.2139/ssrn.5258950},
publisher = {Elsevier BV},
author = {Yacoub, Taher and Depenveiller, Camille and Tatsuma, Atsushi and Barisin, Tin and Rusakov, Eugen and G\"{o}bel, Udo and Peng, Yuxu and Deng, Shiqiang and Kagaya, Yuki and Park, Joon Hong and Kihara, Daisuke and Guerra, Marco and Palmieri, Giorgio and Ranieri, Andrea and Fugacci, Ulderico and Biasotti, Silvia and He, Ruiwen and Benhabiles, Halim and Cabani, Adnane and Hammoudi, Karim and Li, Haotian and Huang, Hao and Li, Chunyan and Tehrani, Alireza and Meng, Fanwang and Heidar-Zadeh, Farnaz and Yang, Tuan-Anh and Montes, Matthieu},
year = {2025}
}
