Skip to content

SMaRt: A Toxicity Classification System through SMILES and Multimodal Representation to Cross-check Hazardous Chemicals

Notifications You must be signed in to change notification settings

joycebyang/SMaRt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMaRt

SMaRt: A Toxicity Classification System through SMILES and Multimodal Representation to Cross-check Hazardous Chemicals

Predicting toxicity of molecules has many real-world applications. It’s used in not only medicine, but also the personal care industry as an alternative method to animal testing when screening cosmetic ingredients, and the agrochemical discovery field when testing pesticides. However, currently computational toxicology is reliant on human-driven feature engineering and isolated to singular domains. This is often driven by either commercial interest – the pharmaceutical industry spent $83 billion dollars on R&D in 2019, or regulatory pressure – animal testing for cosmetic products has been prohibited in the EU since 2009. As of now, there does not exist a general-purpose toxicity classification system.

The objective of this project is to develop a toxicity classification system that uses multiple molecular representations as input to learn the “secret” of each molecule that works well in the seemingly disjoint domains of drugs, cosmetics, and pesticides. Specifically, SMaRt (SMILES and Multimodal Representation) investigates the viability of learning a low-dimensional domain agnostic feature representation through pretraining variational autoencoders (VAE) using large unlabeled dataset of SMILES sequences and 2D molecular drawings. The final product must fulfill the need for a multi-domain computational toxicology tool, such that new domains that are currently constrained by resources and expert knowledge can be explored using SMaRt encoders.

Performance of toxicity classification models built upon SMaRt encoders is consistent with existing research and comparable to other state-of-the-art baselines in drugs and cosmetics. 2D representation of molecules has been used independently to train toxicity classification models. SMaRt combines both representations in pretraining VAE and shows improvement over using SMILES alone. The SMaRt UI dashboard is a one-stop portal that allows users to compare the predictions of multiple domain specific classifiers. It reduces the friction for experts from different domains to compare notes and conduct statistical analysis.

Knowledge fragmentation has long been a problem in computational toxicology. By developing a toxicity classification system that can predict the toxicity of multi-domain chemicals without relying on specialized chemistry knowledge, this project can provide a novel and sustainable toxicity screening method and allow cross-domain analysis of chemicals. The impact of the project is not only to protect animals by eliminating empirical toxicity testing, but also to benefit humans by improving the efficiency of preclinical testing and producing safer industrial chemicals.

Results

SMaRt UI Dashboard

About

SMaRt: A Toxicity Classification System through SMILES and Multimodal Representation to Cross-check Hazardous Chemicals

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages