Skip to content

kosonocky/CheF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chemical Function (CheF) Dataset and Model

DOI

Mining Patents with Large Language Models Elucidates the Chemical Function Landscape

Clayton W. Kosonocky, Claus O. Wilke, Edward M. Marcotte, and Andrew D. Ellington

(Under review)

Dataset

The CheF dataset contains just under 100K molecules and their ChatGPT-summariezd patent-derived functional labels.

The CheF dataset can be found in /results/CheF_100K_final.csv

Visualizer

A visualization of the 100K molecule CheF dataset can be found at chefdb.app.

This visualization is a t-SNE projection of Daylight fingerprints obtained from RDKit and is colored based on whether or not that molecule contains a given label.

The current features of the app are as follows:

  • Highlight points based on selected label. This displays PubChem CID, SMILES, associated CheF labels, and molecular structure
  • Zoom in/out
  • Drag plot to change location (if zoomed in)
  • Toggle tooltips on only highlighted points or all points

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published