This repository contains the code for the creation and prediction of the virtual chemical space for the dearomative cycloaddition to heterocycles as described in this paper
For installation run
git clone --recurse-submodules https://github.com/le-schlo/EnT_Substrate_Mapping.git
cd EnT_Substrate_Mapping/
pip install -r requirements.txt
#Additionally you need to go to the EasyChemML directory and install the necessary dependencies separately.
cd EnTdecker/EasyChemML
pip install ./
cd ../../
pip install -r requirements.txt
The installation on a standard Linux machine takes approx. 2-3 minutes.
The scripts to run the different parts of the workflow are located in the examples directory.
A typical workflow would involve the following steps:
-
Create the virtual chemical space:
examples/space_creation.py- Define the core molecule (and matching SMARTS for decoration) for which the virtual library should be created. You can use the functional_groups in
CombinatorialSpace/functional_groups.txtor define your own. - Save your virtual library as a
.csvfile to theData/virtual_librariesdirectory. The naming convention should bemol_substitution.csv. Wheremolis the name of the core molecule (e.g., thiophene) andsubstitutionis the degree of substitution (e.g., monosubstituted).
- Define the core molecule (and matching SMARTS for decoration) for which the virtual library should be created. You can use the functional_groups in
-
Obtain predictions:
examples/predict_et.pyandexamples/predict_sp.py:- Download the pre-trained EnTdecker models from the Zenodo repository and place them in the respective directories in the
Modelsdirectory. - Alternatively you can also train / retrain your own models. For this please refer to the instructions in the EnTdecker repository.
- Run the scripts for prediction. The predictions should be saved to
Data/et_predictions/predictions_mol_substitution.csvfor triplet energy prediction andData/sp_predictions/predictions_mol_substitution.csvfor spin population prediction.
- Download the pre-trained EnTdecker models from the Zenodo repository and place them in the respective directories in the
-
Create images for the interactive analysis (Optional):
examples/image_creation.py- This script will create images. The user has two options controllable with the
type_of_imagevariablestructure: The images will be the structures of the molecules generated with RDKitspin_population: The images will be a heat map of predicted spin population of the molecules- The images should be saved to the
Data/images/ring_class_substitution/idx.jpegWherering_classis the name you used to create the space and substitution the degree of substitution,idxis the index of the molecule in the virtual library. A dummy image nameddefault.pngis provided in theData/imagesdirectory if no image for the molecule was created.
- This script will create images. The user has two options controllable with the
-
Choose your analysis method:
-
Option A: Run the interactive analysis:
examples/interactive_analysis.py- Using the previously generated files, an interactive 3D plot can be generated.
- The representation of the molecules can be chosen with the
representationvariable:ECFP: The molecules are represented by their ECFP fingerprints generated with RDKitMACCS: The molecules are represented by their MACCS fingerprints generated with RDKitMACCS+ECFP: The molecules are represented by the concatenation of their ECFP and MACCS fingerprints
- The parameters for the dimensionality reduction with UMAP can be adjusted:
n_neighbors(default: 60)min_dist(default: 0.15)metric(default:euclidean)
-
Option B: Alternatively, run an automated analysis:
examples/automated_analysis.py- This script screens the virtual libraries for promising candidates (e.g., molecules below a triplet energy threshold and with spin population on a defined core substructure)
- Set the triplet energy threshold using the
thresholdvariable (default: 62 kcal/mol) - Define
core_smilesto filter for spin population localization - Filtered results are saved to
Data/potential_candidates_ring_class_substitution.csv
-
A typical run time for a substrate space with ~1000 molecules is approx. 3-5 minutes when using GPU support, and 10-15 minutes when run on CPU only.
We provide an exemplary workflow as a jupyter notebook here
Moreover, a jupyter notebook can be run without a local installation on google colab using this link
@article{doi:10.1021/jacs.5c09249,
author = {Rana, Debanjan and H{\"u}mpel, Carla and Laskar, Ranjini and Schlosser, Leon and Korgitzsch, Sophie and Dutta, Subhabrata and Daniliuc, Constantin G. and Glorius, Frank},
title = {Accelerated Discovery of Energy Transfer-Catalyzed Dearomative Cycloadditions through a Data-Driven Three-Layer Screening Strategy},
journal = {Journal of the American Chemical Society},
volume = {147},
number = {31},
pages = {28359-28369},
year = {2025},
doi = {10.1021/jacs.5c09249},
note ={PMID: 40700715},
URL = {https://doi.org/10.1021/jacs.5c09249},
eprint = {https://doi.org/10.1021/jacs.5c09249}
}