Potential energy surface datasets with near-complete coverage of the periodic table are used to train foundation potentials (FPs), i.e., machine learning interatomic potentials (MLIPs) with near-complete coverage of the periodic table. MatPES is an initiative by the Materials Virtual Lab and the Materials Project to address critical deficiencies in such PES datasets for materials.
- Accuracy. MatPES is computed using static DFT calculations with stringent converegence criteria.
Please refer to the
MatPESStaticSet
in [pymatgen] for details. - Comprehensiveness. MatPES structures are sampled using a 2-stage version of DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling from a greatly expanded configuration of MD structures.
- Quality. MatPES includes computed data from the PBE functional, as well as the high fidelity r2SCAN meta-GGA functional with improved description across diverse bonding and chemistries.
The initial v2025.1 release comprises ~400,000 structures from 300K MD simulations. This dataset is much smaller than other PES datasets in the literature and yet achieves comparable or, in some cases, improved performance and reliability on trained FPs.
MatPES is part of the MatML ecosystem, which includes the MatGL (Materials Graph Library) and maml (MAterials Machine Learning) packages, the MatPES (Materials Potential Energy Surface) dataset, and the MatCalc (Materials Calculator).
The MatPES dataset is available on Hugging Face. You can use the
datasets
package to download it.
from datasets import load_dataset
load_dataset("mavrl/matpes", "pbe")
load_dataset("mavrl/matpes", "r2scan")
The matpes
python package, which provides tools for working with the MatPES datasets, can be installed via pip:
pip install matpes
Some command line usage examples:
# Download the PBE dataset to the current directory
matpes download pbe
# You should see a MatPES-PBE-20240214.json.gz file in your directory.
# Extract all entries in the Fe-O chemical system
matpes data -i MatPES-PBE-20240214.json.gz --chemsys Fe-O -o Fe-O.json.gz
The matpes.db
module provides functionality to create your own MongoDB database with the MatPES downloaded data,
which is extremely useful if you are going to be working with the data (e.g., querying, adding entries, etc.) a lot.
We have released a set of MatPES-trained foundation potentials (FPs) in the M3GNet, CHGNet, TensorNet architectures in the MatGL package. For example, you can load the TensorNet FP trained on MatPES PBE 2025.1 as follows:
import matgl
potential = matgl.load_model("TensorNet-MatPES-PBE-v2025.1-PES")
The naming of the models follow the format <architecture>-<dataset>-<dataset-version>-PES
.
These FPs can be used easily with the MatCalc package to rapidly compute properties. For example:
from matcalc.elasticity import ElasticityCalc
from matgl.ext.ase import PESCalculator
ase_calc = PESCalculator(potential)
calculator = ElasticityCalc(ase_calc)
calculator.calc(structure)
We have provided Jupyter notebooks demonstrating how to load the MatPES dataset, train a model and perform fine-tuning.
If you use the MatPES dataset, please cite the following work:
Kaplan, A. D.; Liu, R.; Qi, J.; Ko, T. W.; Deng, B.; Riebesell, J.; Ceder, G.; Persson, K. A.; Ong, S. P. A
Foundational Potential Energy Surface Dataset for Materials. arXiv 2025. DOI: 10.48550/arXiv.2503.04070.
In addition, if you use any of the pre-trained FPs or architectures, please cite the references provided on the architecture used as well as MatGL.