-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hello, I am trying to fine-tune the GRACE-2L-OAM using this input yaml file
'''
seed: 1
cutoff: 6.0
data:
filename: train_data.extxyz
test_filename: test_data.extxyz
reference_energy: {Cl: -0.29658376, S: -1.884791, P: -0.88440305, Li: -0.25125556}
reference_energy: {Al: -1.23, Li: -3.56}
save_dataset: False
stress_units: eV/A3 # eV/A3 (default) or GPa or kbar or -kbar
potential:
If elements not provided - determined automatically from data
preset: GRACE_1LAYER # LINEAR, FS, GRACE_1LAYER, GRACE_2LAYER
finetune_foundation_model: GRACE-2L-OAM
reduce_elements: False
For custom model from model.py::custom_model
custom: model.custom_model
shift: True # True/False
scale: True # False/True or float
fit:
loss: {
energy: { weight: 100, type: huber , delta: 0.01 },
forces: { weight: 10, type: huber , delta: 0.01 },
stress: { weight: 0.1, type: huber , delta: 0.01 },
}
maxiter: 600 # Number of epochs / iterations
optimizer: Adam
opt_params: {
learning_rate: 0.0001,
amsgrad: True,
use_ema: True,
ema_momentum: 0.99,
weight_decay: null,
clipvalue: 1.0,
}
for learning-rate reduction
learning_rate_reduction: { patience: 10, factor: 0.99, min: 5.0e-4, stop_at_min: True, resume_lr: True, }
optimizer: L-BFGS-B
opt_params: { "maxcor": 100, "maxls": 20 }
needed for low-energy tier metrics and for "convex_hull"-based distance of energy-based weighting scheme
compute_convex_hull: False
batch_size: 32 # Important hyperparameter for Adam and irrelevant (but must be) for L-BFGS-B
test_batch_size: 8 # test batch size (optional)
jit_compile: True
eval_init_stats: True # to evaluate initial metrics
train_max_n_buckets: 10 # max number of buckets (group of batches of same shape) in train set
test_max_n_buckets: 5 # same for test
checkpoint_freq: 1 # frequency for REGULAR checkpoints.
save_all_regular_checkpoints: True # to store ALL regular checkpoints
progressbar: True # show batch-evaluation progress bar
train_shuffle: True # shuffle train batches on every epoch
'''
but during fine-tuning I keep on seeing energy shift
I have my fine-tune compatible to MPRelaxSet PBE. Despite setting shift and scale i don't see any improvement. Could you tell me what is going on?
'''
2025/11/27 14:57:52 I - Iteration #42/600 TRAIN(TEST): total_loss: 2.110e-02 (2.728e-02) mae/depa: 1.308e-02 (3.028e-02) rmse/depa: 2.052e-02 (5.360e-02) mae/f_comp: 1.219e-01 (8.690e-02) rmse/f_comp: 1.876e-01 (1.691e-01) mae/stress(GPa): 1.123e+00 (2.502e-01) rmse/stress(GPa): 1.518e+00 (3.833e-01) Time(mcs/at): 71 (21)
2025/11/27 14:57:52 I - Minimum value of learning rate 0.0005 is achieved 11 times - stopping
2025/11/27 14:57:53 I - Regular checkpointing
2025/11/27 14:57:53 I - Sharding callback duration: 42 microseconds
2025/11/27 14:57:54 I - Loading best test loss model
2025/11/27 14:57:56 I - Loaded checkpoint from seed/1/checkpoints/checkpoint.best_test_loss
2025/11/27 14:57:56 I - Saving model to final_model
INFO:tensorflow:Assets written to: seed/1/final_model/assets
2025/11/27 14:58:04 I - Assets written to: seed/1/final_model/assets
'''