SynLlama is a fine-tuned version of Meta's Llama3 large language models that generates synthesizable analogs of small molecules by creating full synthetic pathways using commonly accessible building blocks and robust organic reaction templates, offering a valuable tool for drug discovery with strong performance in bottom-up synthesis, synthesizable analog generation, and hit expansion.
Ensure you have conda
installed on your system. All additional dependencies will be managed via the environment.yml
file.
To get started with SynLlama, follow these steps:
git clone https://github.com/THGLab/SynLlama
cd SynLlama
conda env create -f environment.yml
conda activate synllama
pip install -e .
To perform inference using the already trained SynLlama, download the trained models and relevant files from here and follow the instructions in the Inference Guide.
If you are interested in retraining the model, please refer to the Retraining Guide for detailed instructions.
This project is licensed under the MIT License - see the LICENSE file for details
This project is built on top of the ChemProjector Repo. We thank the authors for building such a user-friendly github!
If you use this code in your research, please cite:
@misc{sun_synllama_2025,
title = {SynLlama: Generating Synthesizable Molecules and Their Analogs with Large Language Models},
url = {http://arxiv.org/abs/2503.12602},
doi = {10.48550/arXiv.2503.12602},
publisher = {arXiv},
author = {Sun, Kunyang and Bagni, Dorian and Cavanagh, Joseph M. and Wang, Yingze and Sawyer, Jacob M. and Gritsevskiy, Andrew and Head-Gordon, Teresa},
month = mar,
year = {2025}
}