finetuning problem

Hello, I am trying to fine-tune the GRACE-2L-OAM using this input yaml file

'''
seed: 1
cutoff: 6.0

data:
  filename: train_data.extxyz
  test_filename: test_data.extxyz
  reference_energy: {Cl: -0.29658376, S: -1.884791, P: -0.88440305, Li: -0.25125556}
  # reference_energy: {Al: -1.23, Li: -3.56}
  # save_dataset: False
  # stress_units: eV/A3 # eV/A3 (default) or GPa or kbar or -kbar


potential:
  # If elements not provided - determined automatically from data
  # preset: GRACE_1LAYER # LINEAR, FS, GRACE_1LAYER, GRACE_2LAYER
  finetune_foundation_model: GRACE-2L-OAM
  reduce_elements: False

  ## For custom model from model.py::custom_model
  #  custom: model.custom_model

  shift: True # True/False
  scale: True # False/True or float
  
fit:
  loss: {
    energy: { weight: 100, type: huber , delta: 0.01 },
    forces: { weight: 10, type: huber , delta: 0.01 },
    stress: { weight: 0.1, type: huber , delta: 0.01 },
    
  }

  maxiter: 600 # Number of epochs / iterations
  optimizer: Adam
  opt_params: {
            learning_rate: 0.0001,
            amsgrad: True,
            use_ema: True,
            ema_momentum: 0.99,
            weight_decay: null,
            clipvalue: 1.0,
        }

  # for learning-rate reduction
  learning_rate_reduction: { patience: 10, factor: 0.99, min: 5.0e-4, stop_at_min: True, resume_lr: True, }

  #  optimizer: L-BFGS-B
  #  opt_params: { "maxcor": 100, "maxls": 20 }

  ## needed for low-energy tier metrics and for "convex_hull"-based distance of energy-based weighting scheme
  compute_convex_hull: False
  batch_size: 32 # Important hyperparameter for Adam and irrelevant (but must be) for L-BFGS-B
  test_batch_size: 8 # test batch size (optional)

  jit_compile: True
  eval_init_stats: True # to evaluate initial metrics

  train_max_n_buckets: 10 # max number of buckets (group of batches of same shape) in train set
  test_max_n_buckets: 5 # same for test

  checkpoint_freq: 1 # frequency for **REGULAR** checkpoints.
  # save_all_regular_checkpoints: True # to store ALL regular checkpoints
  progressbar: True # show batch-evaluation progress bar
  train_shuffle: True # shuffle train batches on every epoch


'''

but during fine-tuning I keep on seeing energy shift

<img width="2369" height="809" alt="Image" src="https://github.com/user-attachments/assets/b7521f2d-f293-4e65-9441-81444790cbae" />

I have my fine-tune compatible to MPRelaxSet PBE. Despite setting shift and scale i don't see any improvement. Could you tell me what is going on? 

'''
2025/11/27 14:57:52 I - Iteration #42/600 TRAIN(TEST): total_loss: 2.110e-02 (2.728e-02) mae/depa: 1.308e-02 (3.028e-02) rmse/depa: 2.052e-02 (5.360e-02) mae/f_comp: 1.219e-01 (8.690e-02) rmse/f_comp: 1.876e-01 (1.691e-01) mae/stress(GPa): 1.123e+00 (2.502e-01) rmse/stress(GPa): 1.518e+00 (3.833e-01) Time(mcs/at): 71 (21)
2025/11/27 14:57:52 I - Minimum value of learning rate 0.0005 is achieved  11 times - stopping
2025/11/27 14:57:53 I - Regular checkpointing
2025/11/27 14:57:53 I - Sharding callback duration: 42 microseconds
2025/11/27 14:57:54 I - Loading best test loss model
2025/11/27 14:57:56 I - Loaded checkpoint from seed/1/checkpoints/checkpoint.best_test_loss
2025/11/27 14:57:56 I - Saving model to `final_model`
INFO:tensorflow:Assets written to: seed/1/final_model/assets
2025/11/27 14:58:04 I - Assets written to: seed/1/final_model/assets
'''



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetuning problem #20

reference_energy: {Al: -1.23, Li: -3.56}

save_dataset: False

stress_units: eV/A3 # eV/A3 (default) or GPa or kbar or -kbar

If elements not provided - determined automatically from data

preset: GRACE_1LAYER # LINEAR, FS, GRACE_1LAYER, GRACE_2LAYER

For custom model from model.py::custom_model

custom: model.custom_model

for learning-rate reduction

optimizer: L-BFGS-B

opt_params: { "maxcor": 100, "maxls": 20 }

needed for low-energy tier metrics and for "convex_hull"-based distance of energy-based weighting scheme

save_all_regular_checkpoints: True # to store ALL regular checkpoints

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

finetuning problem #20

Description

reference_energy: {Al: -1.23, Li: -3.56}

save_dataset: False

stress_units: eV/A3 # eV/A3 (default) or GPa or kbar or -kbar

If elements not provided - determined automatically from data

preset: GRACE_1LAYER # LINEAR, FS, GRACE_1LAYER, GRACE_2LAYER

For custom model from model.py::custom_model

custom: model.custom_model

for learning-rate reduction

optimizer: L-BFGS-B

opt_params: { "maxcor": 100, "maxls": 20 }

needed for low-energy tier metrics and for "convex_hull"-based distance of energy-based weighting scheme

save_all_regular_checkpoints: True # to store ALL regular checkpoints

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions