Skip to content

Accelerated Discovery in Dearomative Energy Transfer Catalysis through Data-Guided Reaction Screening

Notifications You must be signed in to change notification settings

SPP2363/EnT_Substrate_Mapping

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Substrate mapping for energy transfer catalysis

This repository contains the code for the creation and prediction of the virtual chemical space for the dearomative cycloaddition to heterocycles as described in this paper

Installation

For installation run

git clone --recurse-submodules https://github.com/le-schlo/EnT_Substrate_Mapping.git
cd EnT_Substrate_Mapping/
pip install -r requirements.txt

#Additionally you need to go to the EasyChemML directory and install the necessary dependencies separately.
cd EnTdecker/EasyChemML
pip install ./

cd ../../
pip install -r requirements.txt

The installation on a standard Linux machine takes approx. 2-3 minutes.

Usage

The scripts to run the different parts of the workflow are located in the examples directory. A typical workflow would involve the following steps:

  1. Create the virtual chemical space: examples/space_creation.py

    • Define the core molecule (and matching SMARTS for decoration) for which the virtual library should be created. You can use the functional_groups in CombinatorialSpace/functional_groups.txt or define your own.
    • Save your virtual library as a .csv file to the Data/virtual_libraries directory. The naming convention should be mol_substitution.csv. Where mol is the name of the core molecule (e.g., thiophene) and substitution is the degree of substitution (e.g., monosubstituted).

  2. Obtain predictions: examples/predict_et.py and examples/predict_sp.py:

    • Download the pre-trained EnTdecker models from the Zenodo repository and place them in the respective directories in the Models directory.
    • Alternatively you can also train / retrain your own models. For this please refer to the instructions in the EnTdecker repository.
    • Run the scripts for prediction. The predictions should be saved to Data/et_predictions/predictions_mol_substitution.csv for triplet energy prediction and Data/sp_predictions/predictions_mol_substitution.csv for spin population prediction.

  3. Create images for the interactive analysis (Optional): examples/image_creation.py

    • This script will create images. The user has two options controllable with the type_of_image variable
      • structure: The images will be the structures of the molecules generated with RDKit
      • spin_population: The images will be a heat map of predicted spin population of the molecules
      • The images should be saved to the Data/images/ring_class_substitution/idx.jpeg Where ring_class is the name you used to create the space and substitution the degree of substitution, idx is the index of the molecule in the virtual library. A dummy image named default.png is provided in the Data/images directory if no image for the molecule was created.

  4. Choose your analysis method:

    • Option A: Run the interactive analysis: examples/interactive_analysis.py

      • Using the previously generated files, an interactive 3D plot can be generated.
      • The representation of the molecules can be chosen with the representation variable:
        • ECFP: The molecules are represented by their ECFP fingerprints generated with RDKit
        • MACCS: The molecules are represented by their MACCS fingerprints generated with RDKit
        • MACCS+ECFP: The molecules are represented by the concatenation of their ECFP and MACCS fingerprints
      • The parameters for the dimensionality reduction with UMAP can be adjusted:
        • n_neighbors (default: 60)
        • min_dist (default: 0.15)
        • metric (default: euclidean)
    • Option B: Alternatively, run an automated analysis: examples/automated_analysis.py

      • This script screens the virtual libraries for promising candidates (e.g., molecules below a triplet energy threshold and with spin population on a defined core substructure)
      • Set the triplet energy threshold using the threshold variable (default: 62 kcal/mol)
      • Define core_smiles to filter for spin population localization
      • Filtered results are saved to Data/potential_candidates_ring_class_substitution.csv

A typical run time for a substrate space with ~1000 molecules is approx. 3-5 minutes when using GPU support, and 10-15 minutes when run on CPU only.

We provide an exemplary workflow as a jupyter notebook here

Moreover, a jupyter notebook can be run without a local installation on google colab using this link

Citation

@article{doi:10.1021/jacs.5c09249,
author = {Rana, Debanjan and H{\"u}mpel, Carla and Laskar, Ranjini and Schlosser, Leon and Korgitzsch, Sophie and Dutta, Subhabrata and Daniliuc, Constantin G. and Glorius, Frank},
title = {Accelerated Discovery of Energy Transfer-Catalyzed Dearomative Cycloadditions through a Data-Driven Three-Layer Screening Strategy},
journal = {Journal of the American Chemical Society},
volume = {147},
number = {31},
pages = {28359-28369},
year = {2025},
doi = {10.1021/jacs.5c09249},
note ={PMID: 40700715},
URL = {https://doi.org/10.1021/jacs.5c09249},
eprint = {https://doi.org/10.1021/jacs.5c09249}
}

About

Accelerated Discovery in Dearomative Energy Transfer Catalysis through Data-Guided Reaction Screening

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%