Skip to content

materialsvirtuallab/matpes

GitHub license Linting

Aims

Potential energy surface datasets with near-complete coverage of the periodic table are used to train foundation potentials (FPs), i.e., machine learning interatomic potentials (MLIPs) with near-complete coverage of the periodic table. MatPES is an initiative by the Materials Virtual Lab and the Materials Project to address critical deficiencies in such PES datasets for materials.

  1. Accuracy. MatPES is computed using static DFT calculations with stringent converegence criteria. Please refer to the MatPESStaticSet in [pymatgen] for details.
  2. Comprehensiveness. MatPES structures are sampled using a 2-stage version of DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling from a greatly expanded configuration of MD structures.
  3. Quality. MatPES includes computed data from the PBE functional, as well as the high fidelity r2SCAN meta-GGA functional with improved description across diverse bonding and chemistries.

The initial v2025.1 release comprises ~400,000 structures from 300K MD simulations. This dataset is much smaller than other PES datasets in the literature and yet achieves comparable or, in some cases, improved performance and reliability on trained FPs.

MatPES is part of the MatML ecosystem, which includes the MatGL (Materials Graph Library) and maml (MAterials Machine Learning) packages, the MatPES (Materials Potential Energy Surface) dataset, and the MatCalc (Materials Calculator).

Getting the DataSet

Hugging Face

The MatPES dataset is available on Hugging Face. You can use the datasets package to download it.

from datasets import load_dataset

load_dataset("mavrl/matpes", "pbe")

load_dataset("mavrl/matpes", "r2scan")

MatPES Package

The matpes python package, which provides tools for working with the MatPES datasets, can be installed via pip:

pip install matpes

Some command line usage examples:

# Download the PBE dataset to the current directory
matpes download pbe

# You should see a MatPES-PBE-20240214.json.gz file in your directory.

# Extract all entries in the Fe-O chemical system
matpes data -i MatPES-PBE-20240214.json.gz --chemsys Fe-O -o Fe-O.json.gz

The matpes.db module provides functionality to create your own MongoDB database with the MatPES downloaded data, which is extremely useful if you are going to be working with the data (e.g., querying, adding entries, etc.) a lot.

MatPES-trained Models

We have released a set of MatPES-trained foundation potentials (FPs) in the M3GNet, CHGNet, TensorNet architectures in the MatGL package. For example, you can load the TensorNet FP trained on MatPES PBE 2025.1 as follows:

import matgl

potential = matgl.load_model("TensorNet-MatPES-PBE-v2025.1-PES")

The naming of the models follow the format <architecture>-<dataset>-<dataset-version>-PES.

These FPs can be used easily with the MatCalc package to rapidly compute properties. For example:

from matcalc.elasticity import ElasticityCalc
from matgl.ext.ase import PESCalculator

ase_calc = PESCalculator(potential)
calculator = ElasticityCalc(ase_calc)
calculator.calc(structure)

Tutorials

We have provided Jupyter notebooks demonstrating how to load the MatPES dataset, train a model and perform fine-tuning.

Citing

If you use the MatPES dataset, please cite the following work:

Kaplan, A. D.; Liu, R.; Qi, J.; Ko, T. W.; Deng, B.; Riebesell, J.; Ceder, G.; Persson, K. A.; Ong, S. P. A
Foundational Potential Energy Surface Dataset for Materials. arXiv 2025. DOI: 10.48550/arXiv.2503.04070.

In addition, if you use any of the pre-trained FPs or architectures, please cite the references provided on the architecture used as well as MatGL.

About

A foundational potential energy dataset for materials

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •