Substrate mapping for energy transfer catalysis

This repository contains the code for the creation and prediction of the virtual chemical space for the dearomative cycloaddition to heterocycles as described in this paper

Installation

For installation run

git clone --recurse-submodules https://github.com/le-schlo/EnT_Substrate_Mapping.git
cd EnT_Substrate_Mapping/
pip install -r requirements.txt

#Additionally you need to go to the EasyChemML directory and install the necessary dependencies separately.
cd EnTdecker/EasyChemML
pip install ./

cd ../../
pip install -r requirements.txt

The installation on a standard Linux machine takes approx. 2-3 minutes.

Usage

The scripts to run the different parts of the workflow are located in the examples directory. A typical workflow would involve the following steps:

Create the virtual chemical space: examples/space_creation.py
- Define the core molecule (and matching SMARTS for decoration) for which the virtual library should be created. You can use the functional_groups in CombinatorialSpace/functional_groups.txt or define your own.
- Save your virtual library as a .csv file to the Data/virtual_libraries directory. The naming convention should be mol_substitution.csv. Where mol is the name of the core molecule (e.g., thiophene) and substitution is the degree of substitution (e.g., monosubstituted).
Obtain predictions: examples/predict_et.py and examples/predict_sp.py:
- Download the pre-trained EnTdecker models from the Zenodo repository and place them in the respective directories in the Models directory.
- Alternatively you can also train / retrain your own models. For this please refer to the instructions in the EnTdecker repository.
- Run the scripts for prediction. The predictions should be saved to Data/et_predictions/predictions_mol_substitution.csv for triplet energy prediction and Data/sp_predictions/predictions_mol_substitution.csv for spin population prediction.
Create images for the interactive analysis (Optional): examples/image_creation.py
- This script will create images. The user has two options controllable with the type_of_image variable
  - structure: The images will be the structures of the molecules generated with RDKit
  - spin_population: The images will be a heat map of predicted spin population of the molecules
  - The images should be saved to the Data/images/ring_class_substitution/idx.jpeg Where ring_class is the name you used to create the space and substitution the degree of substitution, idx is the index of the molecule in the virtual library. A dummy image named default.png is provided in the Data/images directory if no image for the molecule was created.
Choose your analysis method:
- Option A: Run the interactive analysis: examples/interactive_analysis.py
  - Using the previously generated files, an interactive 3D plot can be generated.
  - The representation of the molecules can be chosen with the representation variable:
    - ECFP: The molecules are represented by their ECFP fingerprints generated with RDKit
    - MACCS: The molecules are represented by their MACCS fingerprints generated with RDKit
    - MACCS+ECFP: The molecules are represented by the concatenation of their ECFP and MACCS fingerprints
  - The parameters for the dimensionality reduction with UMAP can be adjusted:
    - n_neighbors (default: 60)
    - min_dist (default: 0.15)
    - metric (default: euclidean)
- Option B: Alternatively, run an automated analysis: examples/automated_analysis.py
  - This script screens the virtual libraries for promising candidates (e.g., molecules below a triplet energy threshold and with spin population on a defined core substructure)
  - Set the triplet energy threshold using the threshold variable (default: 62 kcal/mol)
  - Define core_smiles to filter for spin population localization
  - Filtered results are saved to Data/potential_candidates_ring_class_substitution.csv

A typical run time for a substrate space with ~1000 molecules is approx. 3-5 minutes when using GPU support, and 10-15 minutes when run on CPU only.

We provide an exemplary workflow as a jupyter notebook here

Moreover, a jupyter notebook can be run without a local installation on google colab using this link

Citation

@article{doi:10.1021/jacs.5c09249,
author = {Rana, Debanjan and H{\"u}mpel, Carla and Laskar, Ranjini and Schlosser, Leon and Korgitzsch, Sophie and Dutta, Subhabrata and Daniliuc, Constantin G. and Glorius, Frank},
title = {Accelerated Discovery of Energy Transfer-Catalyzed Dearomative Cycloadditions through a Data-Driven Three-Layer Screening Strategy},
journal = {Journal of the American Chemical Society},
volume = {147},
number = {31},
pages = {28359-28369},
year = {2025},
doi = {10.1021/jacs.5c09249},
note ={PMID: 40700715},
URL = {https://doi.org/10.1021/jacs.5c09249},
eprint = {https://doi.org/10.1021/jacs.5c09249}
}

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
CombinatorialSpace		CombinatorialSpace
Data		Data
EnTdecker @ 96d5817		EnTdecker @ 96d5817
Models		Models
examples		examples
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Substrate mapping for energy transfer catalysis

Installation

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

SPP2363/EnT_Substrate_Mapping

Folders and files

Latest commit

History

Repository files navigation

Substrate mapping for energy transfer catalysis

Installation

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages