diff --git a/lambench/metrics/results/README.md b/lambench/metrics/results/README.md
index 0741370..1f26f2e 100644
--- a/lambench/metrics/results/README.md
+++ b/lambench/metrics/results/README.md
@@ -15,6 +15,14 @@ Large atomistic models (LAM), also known as machine learning interatomic potenti
 - **Extensible**: Easily add new benchmarks and metrics.
 - **Detailed Reports**: Generates detailed performance reports and visualizations.
 
+## Updates
+The following changes have been made compared to the previouly release version v0.3.1:
+- Added new models: MACE-MH-1, DPA-3.2-5M
+- Updated `Force Field Prediction` tasks, and for the domain of `Molecules`, two sets of labels were provided to support OMol25-trained models.
+- Added new `Property Calculation` tasks: oxygen vacancy formation energy prediction, protein-ligand binding energy prediction, and reaction energy barrier prediction.
+
+<span style="color:red">⚠️ Note: To assess full LAM capacity, we use OMat24-trained task heads for *Force Field Prediction* in Inorganic Materials and Catalysis, and OMol25-trained task heads for Molecules, when available. As for *Property Calculation*, we follow a similar approach, but use OC20-trained task heads for Catalysis when available, as this tends to yield better performance.</span>
+
 # LAMBench Leaderboard
 
 The LAMBench Leaderboard.
@@ -34,9 +42,10 @@ Figure 2: Accuracy-Efficiency Trade-off, $\bar{M}^m_{FF}$ vs $M_E^m$.
 
 We categorize all force-field prediction tasks into 3 domains:
 
-- **Inorganic Materials**: `Torres2019Analysis`, `Batzner2022equivariant`, `Sours2023Applications`, `Lopanitsyna2023Modeling`, `Mazitov2024Surface`, `Gao2025Spontaneous`
-- **Molecules**: `ANI-1x`, `MD22`, `AIMD-Chig`
-- **Catalysis**: `Vandermause2022Active`, `Zhang2019Bridging`, `Villanueva2024Water`
+- **Inorganic Materials**: `Torres2019Analysis`, `Batzner2022equivariant`, `Sours2023Applications`, `Lopanitsyna2023Modeling`, `Mazitov2024Surface`, `Gao2025Spontaneous`, `Gao2025Mechanism`
+- **Molecules**: `Sandonas2024Dataset`, `Guan2022Benchmark`, `AIMD-Chig`
+- **Catalysis**: `Vandermause2022Active`, `Zhang2019Bridging`, `Villanueva2024Water`,
+`Schaaf2023Accurate`, `Liu2025Generalized`
 
 To assess model performance across these domains, we use zero-shot inference with energy-bias term adjustments based on test dataset statistics. Performance metrics are aggregated as follows:
 
@@ -46,7 +55,7 @@ To assess model performance across these domains, we use zero-shot inference wit
 
     where $M^m_{k,p,i}$ is the original error metric, $m$ indicates the model, $k$ denotes the domain index, $p$ signifies the prediction index, and $i$ represents the test set index. For a model with worse accuracy than a dummy model, the error metric is set to 1.
     For instance, in force field tasks, the domains include Molecules, Inorganic Materials, and Catalysis, such that $k \in \{\text{Molecules, Inorganic Materials, Catalysis}\}$. The prediction types are categorized as energy ($E$), force ($F$), or virial ($V$), with $p \in \{E, F, V\}$.
-    For the specific domain of Molecules, the test sets are indexed as $i \in \{\text{ANI-1x, MD22, AIMD-Chig}\}$. This baseline model predicts energy based solely on the chemical formula, disregarding any structural details, thereby providing a reference point for evaluating the improvement offered by more sophisticated models.
+    For the specific domain of Molecules, the test sets are indexed as $i \in \{\text{Sandonas2024Dataset, Guan2022Benchmark, AIMD-Chig}\}$. This baseline model predicts energy based solely on the chemical formula, disregarding any structural details, thereby providing a reference point for evaluating the improvement offered by more sophisticated models.
 
 2. For each domain, we compute the log-average of normalized metrics across all datasets  within this domain by
 
@@ -83,12 +92,11 @@ In contrast, an ideal model that perfectly matches Density Functional Theory (DF
 
 For the domain-specific property calculation tasks, we adopt the MAE as the primary error metric.
 
-In the Inorganic Materials domain, the MDR phonon benchmark predicts the maximum phonon frequency, entropy, free energy, and heat capacity at constant volume, while the elasticity benchmark evaluates the shear and bulk moduli. Each prediction type
-is assigned an equal weight of $\frac{1}{6}$.
+In the Inorganic Materials domain, the MDR phonon benchmark predicts maximum phonon frequency, entropy, free energy, and constant-volume heat capacity; the elasticity benchmark evaluates shear and bulk moduli; and the oxygen vacancy benchmark evaluates oxygen vacancy formation energies. Each prediction type is equally weighted.
 
-In the Molecules domain, the TorsionNet500 benchmark evaluates the torsion profile energy, torsional barrier height, and the number of molecules for which the predicted torsional barrier height error exceeds 1 kcal/mol. The Wiggle150 benchmark assesses the relative conformer energy profile. Each prediction type in this domain is assigned a weight of 0.25.
+In the Molecules domain, the TorsionNet500 benchmark evaluates torsion profile energy, torsional barrier height, and the number of molecules with barrier height errors exceeding 1 kcal/mol. The Wiggle150 benchmark assesses relative conformer energy profiles. The protein–ligand binding benchmark evaluates binding energies across multiple sites for a given protein. The reaction barrier benchmark assesses forward and reverse barriers for nine reaction types common in organic chemistry and biochemistry. Each prediction type is equally weighted.
 
-In the Catalysis domain, the OC20NEB-OOD benchmark evaluates the energy barrier, reaction energy change (delta energy), and the percentage of reactions with predicted energy barrier errors exceeding 0.1 eV for three reaction types: transfer, dissociation, and desorption. Each prediction type in this domain is assigned a weight of 0.2.
+In the Catalysis domain, the OC20NEB-OOD benchmark evaluates the energy barrier, reaction energy change (delta energy), and the percentage of reactions with predicted energy barrier errors exceeding 0.1 eV for three reaction types: transfer, dissociation, and desorption. Each prediction type is equally weighted.
 
 The resulting error metric after averaging over all domains is denoted as $\bar M^{m}_{PC}$.
 
diff --git a/lambench/metrics/results/metadata.json b/lambench/metrics/results/metadata.json
index 53f20a5..ca25d73 100644
--- a/lambench/metrics/results/metadata.json
+++ b/lambench/metrics/results/metadata.json
@@ -2,10 +2,10 @@
   "generalizability_force_field_results": {
     "DISPLAY_NAME": "Generalizability Tests (Force Field Prediction)",
     "DESCRIPTION": "Energy, force, and virial prediction accuracy of the LAM on the test sets across multiple domains. Given that energy labels calculated via DFT can vary by an arbitrary constant due to variations in pseudopotential selection and software implementations, LAMs are consistently used to predict the energy difference between the label and a dummy LAM that estimates potential energy solely based on the chemical formula. In contrast, force and virial predictions are directly obtained from the models.",
-    "HPt_NC_2022": {
-      "DISPLAY_NAME": "Vandermause2022Active",
-      "DESCRIPTION": "A direct two-phase simulation of heterogeneous hydrogen turnover on the Pt(111) catalyst surface at chemical accuracy. Calculations were performed using VASP with PBE/PAW and an energy cutoff of 450 eV. [https://www.nature.com/articles/s41467-022-32294-0]",
-      "domain": "Catalysis",
+    "AQM":{
+      "DISPLAY_NAME": "Sandonas2024Dataset",
+      "DESCRIPTION": "An extensive dataset that contains low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92, and containing up to 54 non-hydrogen atoms. The dataset was deduplicated against OMol25 to prevent data leakage. [https://www.nature.com/articles/s41597-024-03521-8]. We provide two sets of labels for this dataset: PBE/6-31G(d) with Gaussian (better aligned with OMat24) and ωB97M-V/def2-TZVPD with ORCA (compatible with OMol25).",
+      "domain": "Molecules",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
         "DESCRIPTION": "The root mean squared error of the energy prediction.",
@@ -51,9 +51,9 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "MD22": {
-      "DISPLAY_NAME": "MD22",
-      "DESCRIPTION": "Dataset containing MD trajectories of the 42-atom tetrapeptide Ac-Ala3-NHMe from the MD22 benchmark set. Calculations were performed using FHI-aims and i-Pi software at the DFT-PBE+MBD level of theory. The dataset was relabeled using Gaussian with PBE/6-31G(d). Trajectories were sampled at temperatures between 400-500 K at 1 fs resolution. [https://www.science.org/doi/10.1126/sciadv.adf0873]",
+    "H_nature_2022":{
+      "DISPLAY_NAME": "Guan2022Benchmark",
+      "DESCRIPTION": "A downsampled dataset containing AIMD conformations of 19 reaction channels for hydrogen combustion. [https://www.nature.com/articles/s41597-022-01330-5]. We provide two sets of labels for this dataset: PBE/6-31G(d) with Gaussian (better aligned with OMat24) and ωB97M-V/def2-TZVPD with ORCA (compatible with OMol25).",
       "domain": "Molecules",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -100,9 +100,58 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "Ca_batteries_CM2021": {
-      "DISPLAY_NAME": "Torres2019Analysis",
-      "DESCRIPTION": "A dataset of Ca-bearing minerals, focusing on silicates and carbonates such as olivine, pyroxene, garnet, amphibole, and double carbonates. Calculations were performed using VASP with spin-polarized PBE/PAW and an energy cutoff of 600 eV. [https://www.nature.com/articles/s41598-019-46002-4]",
+    "AIMD_Chig": {
+      "DISPLAY_NAME": "AIMD-Chig",
+      "DESCRIPTION": "A downsampled dataset containing MD conformations of 166-atom protein Chignolin. The original Ab initio simulations were driven by M062X/6-31G* with a Berendsen thermostat at 340 K. [https://www.nature.com/articles/s41597-023-02465-9]. The dataset was relabeled using Gaussian with PBE/6-31G(d).",
+      "domain": "Molecules",
+      "energy_rmse": {
+        "DISPLAY_NAME": "E RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction.",
+        "hide": true
+      },
+      "energy_mae": {
+        "DISPLAY_NAME": "E MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction.",
+        "hide": true
+      },
+      "energy_rmse_natoms": {
+        "DISPLAY_NAME": "E RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction per atom."
+      },
+      "energy_mae_natoms": {
+        "DISPLAY_NAME": "E MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction per atom."
+      },
+      "force_rmse": {
+        "DISPLAY_NAME": "F RMSE (meV/\u00c5)",
+        "DESCRIPTION": "The root mean squared error of the force prediction."
+      },
+      "force_mae": {
+        "DISPLAY_NAME": "F MAE (meV/\u00c5)",
+        "DESCRIPTION": "The mean absolute error of the force prediction."
+      },
+      "virial_rmse": {
+        "DISPLAY_NAME": "V RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction.",
+        "hide": true
+      },
+      "virial_mae": {
+        "DISPLAY_NAME": "V MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction.",
+        "hide": true
+      },
+      "virial_rmse_natoms": {
+        "DISPLAY_NAME": "V RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction per atom."
+      },
+      "virial_mae_natoms": {
+        "DISPLAY_NAME": "V MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
+      }
+    },
+    "HEA25_S": {
+      "DISPLAY_NAME": "Mazitov2024Surface",
+      "DESCRIPTION": "A dataset of high entropy alloy surfaces, focusing on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. The original dataset were calculated using VASP with PBEsol/PAW, 550 eV cutoff, and Γ-centered k-points. [https://arxiv.org/abs/2212.13254]. The dataset was relabeled with VASP at the PBE level.",
       "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -149,9 +198,9 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "NequIP_NC_2022": {
-      "DISPLAY_NAME": "Batzner2022equivariant",
-      "DESCRIPTION": "A downsampled dataset with ~7500 frames. The original dataset contains approximately 57,000 configurations from the evaluation datasets for NequIP graph neural network model for interatomic potentials. Trajectories have been taken from Li-P-S, Li-P-O glass melt-quench simulation. Calculations were performed using VASP with PBE/PAW and an energy cutoff of 400 eV. [https://www.nature.com/articles/s41467-022-29939-5]",
+    "HEA25_bulk": {
+      "DISPLAY_NAME": "Lopanitsyna2023Modeling",
+      "DESCRIPTION": "A dataset of high entropy alloy bulk structures, focusing on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. The original dataset were calculated using VASP with PBEsol/PAW, 550 eV cutoff, and Γ-centered k-points. [https://arxiv.org/abs/2212.13254]. The dataset was relabeled with VASP at the PBE level.",
       "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -198,10 +247,10 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "ANI": {
-      "DISPLAY_NAME": "ANI-1x",
-      "DESCRIPTION": "A downsampled dataset from the training dataset of the ANI-1x model containing 997 frames. The dataset was relabeled using Gaussian with PBE/6-31G(d). [https://doi.org/10.1063/1.5023802]",
-      "domain": "Molecules",
+    "MoS2": {
+      "DISPLAY_NAME": "Gao2025Spontaneous",
+      "DESCRIPTION": "2D MoS2 structures. Calculations were performed using VASP with PBE/PAW and 600 eV cutoff. [https://www.nature.com/articles/s41467-025-56055-x]",
+      "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
         "DESCRIPTION": "The root mean squared error of the energy prediction.",
@@ -247,9 +296,58 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "REANN_CO2_Ni100": {
-      "DISPLAY_NAME": "Zhang2019Bridging",
-      "DESCRIPTION": "Interaction of carbon dioxide with a movable Ni(100) surface. Calculations were performed using VASP with PBE/PAW. Example training data of the REANN package. [https://pubs.acs.org/doi/10.1021/acs.jpclett.9b00085]",
+    "CompressBi":{
+      "DISPLAY_NAME": "Gao2025Mechanism",
+      "DESCRIPTION": "MoS₂–Bi–MoS₂ heterostructures. Calculations were performed using VASP with PBE-D3/PAW. [https://arxiv.org/pdf/2508.06992]",
+      "domain": "Inorganic Materials",
+      "energy_rmse": {
+        "DISPLAY_NAME": "E RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction.",
+        "hide": true
+      },
+      "energy_mae": {
+        "DISPLAY_NAME": "E MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction.",
+        "hide": true
+      },
+      "energy_rmse_natoms": {
+        "DISPLAY_NAME": "E RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction per atom."
+      },
+      "energy_mae_natoms": {
+        "DISPLAY_NAME": "E MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction per atom."
+      },
+      "force_rmse": {
+        "DISPLAY_NAME": "F RMSE (meV/\u00c5)",
+        "DESCRIPTION": "The root mean squared error of the force prediction."
+      },
+      "force_mae": {
+        "DISPLAY_NAME": "F MAE (meV/\u00c5)",
+        "DESCRIPTION": "The mean absolute error of the force prediction."
+      },
+      "virial_rmse": {
+        "DISPLAY_NAME": "V RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction.",
+        "hide": true
+      },
+      "virial_mae": {
+        "DISPLAY_NAME": "V MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction.",
+        "hide": true
+      },
+      "virial_rmse_natoms": {
+        "DISPLAY_NAME": "V RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction per atom."
+      },
+      "virial_mae_natoms": {
+        "DISPLAY_NAME": "V MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
+      }
+    },
+    "Carbon_growth":{
+      "DISPLAY_NAME": "Liu2025Generalized",
+      "DESCRIPTION": "A downsampled datasets containing carbon atoms deposition growth on various substrates including Si(111), Cu(111), Al2O3(0001). Calculations were performed using VASP with PBE-D3/PAW. [https://www.nature.com/articles/s41524-025-01781-5]",
       "domain": "Catalysis",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -296,10 +394,10 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-     "HEA25_bulk": {
-      "DISPLAY_NAME": "Lopanitsyna2023Modeling",
-      "DESCRIPTION": "A dataset of high entropy alloy bulk structures, focusing on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. The original dataset were calculated using VASP with PBEsol/PAW, 550 eV cutoff, and Γ-centered k-points. [https://arxiv.org/abs/2212.13254]. The dataset was relabeled with VASP at the PBE level.",
-      "domain": "Inorganic Materials",
+    "In2O3_CO2":{
+      "DISPLAY_NAME": "Schaaf2023Accurate",
+      "DESCRIPTION": "A downsampled datasets containing configurations obtained from hydrogenation of carbon dioxide to methanol over indium oxide. Calculations were performed using QuantumEspresso with PBE/PAW. [https://www.nature.com/articles/s41524-023-01124-2]",
+      "domain": "Catalysis",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
         "DESCRIPTION": "The root mean squared error of the energy prediction.",
@@ -345,9 +443,58 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "MoS2": {
-      "DISPLAY_NAME": "Gao2025Spontaneous",
-      "DESCRIPTION": "2D MoS2 structures. Calculations were performed using VASP with PBE/PAW and 600 eV cutoff. [https://www.nature.com/articles/s41467-025-56055-x#Sec3]",
+    "REANN_CO2_Ni100": {
+      "DISPLAY_NAME": "Zhang2019Bridging",
+      "DESCRIPTION": "Interaction of carbon dioxide with a movable Ni(100) surface. Calculations were performed using VASP with PBE/PAW. Example training data of the REANN package. [https://pubs.acs.org/doi/10.1021/acs.jpclett.9b00085]",
+      "domain": "Catalysis",
+      "energy_rmse": {
+        "DISPLAY_NAME": "E RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction.",
+        "hide": true
+      },
+      "energy_mae": {
+        "DISPLAY_NAME": "E MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction.",
+        "hide": true
+      },
+      "energy_rmse_natoms": {
+        "DISPLAY_NAME": "E RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the energy prediction per atom."
+      },
+      "energy_mae_natoms": {
+        "DISPLAY_NAME": "E MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the energy prediction per atom."
+      },
+      "force_rmse": {
+        "DISPLAY_NAME": "F RMSE (meV/\u00c5)",
+        "DESCRIPTION": "The root mean squared error of the force prediction."
+      },
+      "force_mae": {
+        "DISPLAY_NAME": "F MAE (meV/\u00c5)",
+        "DESCRIPTION": "The mean absolute error of the force prediction."
+      },
+      "virial_rmse": {
+        "DISPLAY_NAME": "V RMSE (meV)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction.",
+        "hide": true
+      },
+      "virial_mae": {
+        "DISPLAY_NAME": "V MAE (meV)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction.",
+        "hide": true
+      },
+      "virial_rmse_natoms": {
+        "DISPLAY_NAME": "V RMSE (meV/atom)",
+        "DESCRIPTION": "The root mean squared error of the virial prediction per atom."
+      },
+      "virial_mae_natoms": {
+        "DISPLAY_NAME": "V MAE (meV/atom)",
+        "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
+      }
+    },
+    "NequIP_NC_2022": {
+      "DISPLAY_NAME": "Batzner2022equivariant",
+      "DESCRIPTION": "A downsampled dataset with ~7500 frames. The original dataset contains approximately 57,000 configurations from the evaluation datasets for NequIP graph neural network model for interatomic potentials. Trajectories have been taken from Li-P-S, Li-P-O glass melt-quench simulation. Calculations were performed using VASP with PBE/PAW and an energy cutoff of 400 eV. [https://www.nature.com/articles/s41467-022-29939-5]",
       "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -443,9 +590,9 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "Si_ZEO22": {
-      "DISPLAY_NAME": "Sours2023Applications",
-      "DESCRIPTION": "Dataset consisting of 350000 DFT single point energy calculations from 219 different pure silica zeolite topologies. Calculations were performed using VASP with PBE-D3(BJ)/PAW and 400 eV cutoff. [https://github.com/tysours/Si-ZEO22]",
+    "Ca_batteries_CM2021": {
+      "DISPLAY_NAME": "Torres2019Analysis",
+      "DESCRIPTION": "A dataset of Ca-bearing minerals, focusing on silicates and carbonates such as olivine, pyroxene, garnet, amphibole, and double carbonates. Calculations were performed using VASP with spin-polarized PBE/PAW and an energy cutoff of 600 eV. [https://www.nature.com/articles/s41598-019-46002-4]",
       "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -492,10 +639,10 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "AIMD-Chig": {
-      "DISPLAY_NAME": "AIMD-Chig",
-      "DESCRIPTION": "A downsampled dataset containing MD conformations of 166-atom protein Chignolin. The original Ab initio simulations were driven by M062X/6-31G* with a Berendsen thermostat at 340 K. [https://www.nature.com/articles/s41597-023-02465-9]. The dataset was relabeled using Gaussian with PBE/6-31G(d).",
-      "domain": "Molecules",
+    "HPt_NC_2022": {
+      "DISPLAY_NAME": "Vandermause2022Active",
+      "DESCRIPTION": "A direct two-phase simulation of heterogeneous hydrogen turnover on the Pt(111) catalyst surface at chemical accuracy. Calculations were performed using VASP with PBE/PAW and an energy cutoff of 450 eV. [https://www.nature.com/articles/s41467-022-32294-0]",
+      "domain": "Catalysis",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
         "DESCRIPTION": "The root mean squared error of the energy prediction.",
@@ -541,9 +688,9 @@
         "DESCRIPTION": "The mean absolute error of the virial prediction per atom."
       }
     },
-    "HEA25_S": {
-      "DISPLAY_NAME": "Mazitov2024Surface",
-      "DESCRIPTION": "A dataset of high entropy alloy surfaces, focusing on 25 d-block transition metals, excluding Tc, Cd, Re, Os and Hg. The original dataset were calculated using VASP with PBEsol/PAW, 550 eV cutoff, and Γ-centered k-points. [https://arxiv.org/abs/2212.13254]. The dataset was relabeled with VASP at the PBE level.",
+    "Si_ZEO22": {
+      "DISPLAY_NAME": "Sours2023Applications",
+      "DESCRIPTION": "Dataset consisting of 350000 DFT single point energy calculations from 219 different pure silica zeolite topologies. Calculations were performed using VASP with PBE-D3(BJ)/PAW and 400 eV cutoff. [https://github.com/tysours/Si-ZEO22]",
       "domain": "Inorganic Materials",
       "energy_rmse": {
         "DISPLAY_NAME": "E RMSE (meV)",
@@ -742,6 +889,50 @@
         "DISPLAY_NAME": "Success Rate",
         "DESCRIPTION": "The success rate of elastic property calculations."
       }
+    },
+    "vacancy": {
+      "DISPLAY_NAME": "Oxygen Vacancy",
+      "DESCRIPTION": "Evaluation of the oxygen vacancy formation energy over 1813 structures. Structures are obtained from `Chem. Mater. 2023, 35, 24, 10619–10634. https://pubs.acs.org/doi/10.1021/acs.chemmater.3c02251.` ",
+      "MAE": {
+        "DISPLAY_NAME": "MAE (eV)",
+        "DESCRIPTION": "The mean absolute error of the oxygen vacancy formation energy across all configurations."
+      },
+      "RMSE": {
+        "DISPLAY_NAME": "RMSE (eV)",
+        "DESCRIPTION": "The root mean squared error of the oxygen vacancy formation energy across all configurations."
+      }
+    },
+    "binding_energy": {
+      "DISPLAY_NAME": "Protein-Ligand Binding",
+      "DESCRIPTION": "Evaluation of the protein-ligand binding energy using the PLF547 dataset proposed in `J. Chem. Inf. Model. 2020, 60, 3, 1453–1460. https://pubs.acs.org/doi/10.1021/acs.jcim.9b01171.` ",
+      "MAE": {
+        "DISPLAY_NAME": "MAE (kcal/mol)",
+        "DESCRIPTION": "The mean absolute error of the protein-ligand binding energy across all configurations."
+      },
+      "RMSE": {
+        "DISPLAY_NAME": "RMSE (kcal/mol)",
+        "DESCRIPTION": "The root mean squared error of the protein-ligand binding energy across all configurations."
+      },
+      "success_rate":{
+        "DISPLAY_NAME": "Success Rate",
+        "DESCRIPTION": "The success rate of protein-ligand binding energy calculations."
+      }
+    },
+    "rxn_barrier": {
+      "DISPLAY_NAME": "Reaction Barrier",
+      "DESCRIPTION": "Evaluation of the reaction barrier using the BH876 dataset provided in `https://arxiv.org/abs/2508.13468.` ",
+      "MAE": {
+        "DISPLAY_NAME": "MAE (kcal/mol)",
+        "DESCRIPTION": "The mean absolute error of the reaction barrier across all configurations."
+      },
+      "RMSE": {
+        "DISPLAY_NAME": "RMSE (kcal/mol)",
+        "DESCRIPTION": "The root mean squared error of the reaction barrier across all configurations."
+      },
+      "success_rate":{
+        "DISPLAY_NAME": "Success Rate",
+        "DESCRIPTION": "The success rate of reaction barrier calculations."
+      }
     }
   },
   "adaptability_results": {
diff --git a/lambench/models/models_config.yml b/lambench/models/models_config.yml
index e433c51..ce5ffc2 100644
--- a/lambench/models/models_config.yml
+++ b/lambench/models/models_config.yml
@@ -89,7 +89,7 @@
   model_metadata:
     pretty_name: MACE-MH-1
     date_added: 2025-11-26
-    model_description: DP 2025 Q2, 16 layers with dynamic nnei.
+    model_description: A multitask trained model, refer to https://huggingface.co/mace-foundations/mace-mh-1.
     num_parameters: 6439878
     packages:
       mace-torch: mace-mh-1
@@ -104,7 +104,7 @@
   model_metadata:
     pretty_name: DPA-3.2-5M
     date_added: 2025-11-20
-    model_description: DP 2025 Q2, 16 layers with dynamic nnei.
+    model_description: DP 2025 Q4, 24 layers with dynamic nnei and frame-level parameters to support explicit charge/spin inputs.
     num_parameters: 4816561
     packages:
       deepmd-kit: D0708
diff --git a/lambench/tasks/direct/direct_tasks.yml b/lambench/tasks/direct/direct_tasks.yml
index f4522f1..4405f70 100644
--- a/lambench/tasks/direct/direct_tasks.yml
+++ b/lambench/tasks/direct/direct_tasks.yml
@@ -35,5 +35,6 @@ HPt_NC_2022:
   test_data: "/bohr/lambench-ood-zwtr/v5/LAMBench-TestData-v4/HPt_NC2022"
 Carbon_growth:
   test_data: "/bohr/lambench-ood-zwtr/v5/LAMBench-TestData-v4/carbon_film_growth"
+  dispersion_correction: d3zero
 In2O3_CO2:
   test_data: "/bohr/lambench-ood-zwtr/v5/LAMBench-TestData-v4/In2O3_CO2"