This repository corresponds to the DeepReaction project, designed for accurate prediction of chemical reaction properties using graph neural networks.
- Installation
- Data Format
- Dataset
- Dataset Preparation
- Training
- Evaluation
- Advanced Usage
- Citation
- Acknowledgements
- License
# Clone this repository:
git clone https://github.com/chimie-paristech-CTM/DeepReaction.git
cd DeepReaction
# Install in development mode
pip install -e .
# (Optional) For Jupyter notebook support
pip install jupyterlab
# Clone this repository:
git clone https://github.com/chimie-paristech-CTM/DeepReaction.git
cd DeepReaction
# Create the conda environment from the environment.yml file
conda env create -f environment.yml
# Activate the environment
conda activate reaction
# (Optional) For Jupyter notebook support
pip install jupyterlab
β οΈ Note: The version of PyTorch Geometric (PyG) and its related packages must be selected according to your hardware configuration (e.g., CUDA version). Visit the official PyG installation guide to find the correct command for your system.
β οΈ Note: Due to the computational complexity of graph neural network architectures built with PyG (PyTorch Geometric), it is recommended to run them on a GPU for better performance and efficiency.
DeepReaction requires a specific data format for training and prediction. The key components are:
Your main dataset file should be a CSV with the following essential columns:
Column | Description |
---|---|
ID |
Unique identifier for each reaction |
R_dir |
Directory name containing XYZ files (e.g., "reaction_R0") |
smiles |
SMILES representation of the reaction |
DG_act |
Target property: Gibbs free activation energy (kcal/mol) |
DrG |
Target property: Gibbs free reaction energy (kcal/mol) |
DG_act_xtb |
Input feature: XTB-computed approximation of DG_act |
DrG_xtb |
Input feature: XTB-computed approximation of DrG |
ID63623,reaction_R0,[C:1](=[C:2]([C:3](=[C:4]([H:11])[H:12])[H:10])[H:9])([H:7])[H:8].[C:5](=[C:6]([H:15])[H:16])([H:13])[H:14]>>[C:1]1([H:7])([H:8])[C:2]([H:9])=[C:3]([H:10])[C:4]([H:11])([H:12])[C:5]([H:13])([H:14])[C:6]1([H:15])[H:16],35.16,-22.54,21.70,-44.40
For each reaction in your dataset, you need to provide three XYZ files representing the:
- Reactant(s)
- Transition state (TS)
- Product(s)
The XYZ files should be organized in directories named according to the R_dir
column in your CSV:
dataset_root/
βββ reaction_R0/
βββ R0_reactant.xyz
βββ R0_ts.xyz
βββ R0_product.xyz
βββ reaction_R1/
βββ R1_reactant.xyz
βββ R1_ts.xyz
βββ R1_product.xyz
...
[Number of atoms]
[Optional comment line]
[Element] [X coordinate] [Y coordinate] [Z coordinate]
[Element] [X coordinate] [Y coordinate] [Z coordinate]
...
When setting up your configuration, make sure to specify:
file_keywords
: Patterns to identify XYZ files (default:['*_reactant.xyz', '*_ts.xyz', '*_product.xyz']
)target_fields
: Target properties to predict (default:['DG_act', 'DrG']
)input_features
: Features used as input (default:['DG_act_xtb', 'DrG_xtb']
)id_field
: Column name for reaction IDs (default:'ID'
)dir_field
: Column name for directory names (default:'R_dir'
)reaction_field
: Column name for reaction SMILES (default:'reaction'
)
The models in DeepReaction were developed and tested using a comprehensive Diels-Alder reaction dataset:
Dataset link: Diels-Alder Reaction Space for Self-Healing Polymer
This dataset contains:
- 1,580 Diels-Alder reactions with complete 3D structures
- Quantum chemical calculations (DFT and XTB) for transition states and energetics
- Reaction energies, activation energies, and structural information
- Computed properties including DG_act and DrG values
- Download the dataset archive from the Figshare link above
- Extract the contents to your desired location (recommended:
./dataset/DATASET_DA_F/
) - Ensure the dataset has the correct structure as described in the XYZ File Structure section
- Update the dataset paths in your configuration if needed
Place your reaction dataset in the appropriate location:
./dataset
Alternatively, modify the paths in the configuration file or command-line arguments.
To train the model with the dataset using our specialized training script:
# Basic training with default parameters
python example/train.py
--readout
: Readout function type (set_transformer, sum, mean, max, attention)--batch
: Batch size for training--epochs
: Maximum number of training epochs--lr
: Learning rate--node-dim
: Dimension of node latent representations--output
: Output directory for results--reaction-root
: Custom path to reaction dataset root, i.e., the location of the xyz files of reactants, products and TSs--reaction-csv
: Custom path to reaction dataset CSV
To evaluate a trained model(Checkpoint link):
python example/predict.py
The prediction notebook allows you to:
- Load a trained model checkpoint
- Make predictions on new data
- Visualize prediction results
- Compare predictions with actual values (if available)
# Run hyperparameter optimization
python example/hyper.py
If you use DeepReaction or the Diels-Alder dataset in your research, please cite:
[Placeholder]
This implementation is built upon several open-source projects:
This project is licensed under the MIT License - see the LICENSE file for details.