Skip to content

yimqiy/biodeg-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

biodeg-ml

Raw codes for processing chemical biodegradability data using SMILES and RDKit parameters, used to predict ready biodegradability values using regression and GNN-based methods.

This is the cumulative product of research first started at IHPC in mid 2022. The notebook contains scripts and functions for:

  • Loading and saving .sdf and .csv databases containing molecular conformers and SMILES/RDKit parameters respectively.
  • Automated extraction of above-mentioend RDKit parameters as identified.
  • Pre-processing for sk-learn models.
  • Implementation of group-based cross-validataion based on common SMILES (for multiple conformers).

References:

  1. Predicting ready biodegradability in the Japanese Ministry of International Trade and Industry test
  2. Chemprop