Reference implementation of Machine Learning Force Fields with Data Cost Aware Training, accepted to ICML 2023. This codebase is based on GemNet.
Make a directory called raw_data and download RMD17 as well as the relevant CCSD(T) data from SGDML
To process the data, we provide some helper function in process.py
python process.py
We also provide data we generate through MD simulation using empirical force field methods. These datasets (containing data for each of the MD17 molecules) can be downloaded from here.
To pre-train GemNet with the simplest version of ASTEROID, we can run:
mkdir model_dir
bash scripts/aspirin_pretrain_simple.sh
The model_path argument needs to be changed depending on which checkpoint you use. Notice that the model_name in get_predictions.py and the load_name in pretrain_asteroid.py should correspond to one another.
For fine-tuning with randomly inititalized model do
bash scripts/aspirin_base_200.sh
To finetune GNN's pre-trained with ASTEROID, do
bash scripts/aspirin_finetune.sh
Please cite our paper and GemNet if you use the model or this code in your own work:
@inproceedings{gasteiger_gemnet_2021,
title = {GemNet: Universal Directional Graph Neural Networks for Molecules},
author = {Gasteiger, Johannes and Becker, Florian and G{\"u}nnemann, Stephan},
booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
@article{bukharin2023machine,
title={Machine Learning Force Fields with Data Cost Aware Training},
author={Bukharin, Alexander and Liu, Tianyi and Wang, Shengjie and Zuo, Simiao and Gao, Weihao and Yan, Wen and Zhao, Tuo},
journal={arXiv preprint arXiv:2306.03109},
year={2023}
}