This repository contains the official implementation of the results shown in:
- "From Spectra to Structure: AI-Powered 31P-NMR Interpretation"
- "Setting New Benchmarks in AI-driven Infrared Structure Elucidation"
- "Automated Structure Elucidation at Human-Level Accuracy via a Multimodal Multitask Language Model"
- "Language Model Enabled Structure Prediction from Infrared Spectra of Mixtures"
- "IR–NMR Multimodal Computational Spectra Dataset for 177K Patent-Extracted Organic Molecules"
It provides the complete codebase needed to reproduce our results and train models on spectra obtained via IR and NMR spectroscopy. The framework is build on PyTorch, PyTorch Lightning and Hugginface. To install it follow the instructions below.
To install the code base ensure that you have at least Python 3.10 installed. Then follow the steps below. Typically installation takes less than two minutes.
pip install uv
uv pip install -e .
uv pip install -e .[dev]
An example to train a model is provided in scripts/train_model.sh
. The parameters present need to changed according to the desired settings. To change the data for the training the config, column, modality, ... parameters need to be changed. As an example to change the column in the datafile the IR spectra are drawn from change the following parameters. However, we recommended to follow the instructions in the paper replication guides.
data.IR.column=ir_spectra \
Complete instructions for reproducing the results presented in our papers are provided in the papers folder. These documents contains step-by-step guidance, including data preparation, model training parameters, and evaluation procedures to replicate our experiments.