finetune results seems not very stable #125

maris205 · 2024-10-29T12:43:58Z

For the first run, the Evaluation in step 200,400,800:

{'eval_loss': 0.6967583894729614, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1287, 'eval_samples_per_second': 378.436, 'eval_steps_per_second': 23.652, 'epoch': 0.15}

{'eval_loss': 0.6942448019981384, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0649, 'eval_samples_per_second': 386.309, 'eval_steps_per_second': 24.144, 'epoch': 0.3}

{'eval_loss': 0.6936447620391846, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1124, 'eval_samples_per_second': 380.411, 'eval_steps_per_second': 23.776, 'epoch': 0.9}

In 3000 steps:
{'eval_loss': 0.6934958696365356, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0942, 'eval_samples_per_second': 382.652, 'eval_steps_per_second': 23.916, 'epoch': 2.4}

Then run again,the Evaluation in step 200,400,800:

{'eval_loss': 0.6764485239982605, 'eval_accuracy': 0.543918918918919, 'eval_f1': 0.500248562558037, 'eval_matthews_correlation': 0.1047497035448165, 'eval_precision': 0.5646432374866879, 'eval_recall': 0.5424348347148706, 'eval_runtime': 3.1702, 'eval_samples_per_second': 373.483, 'eval_steps_per_second': 23.343, 'epoch': 0.15}

{'eval_loss': 0.6603909730911255, 'eval_accuracy': 0.7170608108108109, 'eval_f1': 0.7006777463594056, 'eval_matthews_correlation': 0.4870947362226591, 'eval_precision': 0.7747432713117492, 'eval_recall': 0.7158936240030817, 'eval_runtime': 3.0877, 'eval_samples_per_second': 383.453, 'eval_steps_per_second': 23.966, 'epoch': 0.45}

{'eval_loss': 0.3846745193004608, 'eval_accuracy': 0.8386824324324325, 'eval_f1': 0.8381642045361026, 'eval_matthews_correlation': 0.6827625873383905, 'eval_precision': 0.8437887048419396, 'eval_recall': 0.8389907406086374, 'eval_runtime': 3.088, 'eval_samples_per_second': 383.423, 'eval_steps_per_second': 23.964, 'epoch': 0.6}

use the default parameter,need I set a different LR?

2020guotao · 2024-11-08T11:59:50Z

对于第一次运行，步骤 200,400,800 中的 Evaluation （评估）：

{'eval_loss'： 0.6967583894729614， 'eval_accuracy'： 0.5025337837837838， 'eval_f1'： 0.3344575604272063， 'eval_matthews_correlation'： 0.0， 'eval_precision'： 0.2512668918919， 'eval_recall'： 0.5， 'eval_runtime'： 3.1287， 'eval_samples_per_second'： 378.436， 'eval_steps_per_second'： 23.652， 'epoch'： 0.15}

{'eval_loss'： 0.6942448019981384， 'eval_accuracy'： 0.49746621621621623， 'eval_f1'： 0.332205301748449， 'eval_matthews_correlation'： 0.0， 'eval_precision'： 0.24873310810811， 'eval_recall'： 0.5， 'eval_runtime'： 3.0649， 'eval_samples_per_second'： 386.309， 'eval_steps_per_second'： 24.144， 'epoch'： 0.3}

{'eval_loss'： 0.6936447620391846， 'eval_accuracy'： 0.5025337837837838， 'eval_f1'： 0.3344575604272063， 'eval_matthews_correlation'： 0.0， 'eval_precision'： 0.2512668918919， 'eval_recall'： 0.5， 'eval_runtime'： 3.1124， 'eval_samples_per_second'： 380.411， 'eval_steps_per_second'： 23.776， 'epoch'： 0.9}

以 3000 步为单位： {'eval_loss'： 0.6934958696365356， 'eval_accuracy'： 0.49746621621623， 'eval_f1'： 0.332205301748449， 'eval_matthews_correlation'： 0.0， 'eval_precision'： 0.24873310810811， 'eval_recall'： 0.5， 'eval_runtime'： 3.0942， 'eval_samples_per_second'： 382.652， 'eval_steps_per_second'： 23.916， 'epoch'： 2.4}

然后再次运行，步骤 200,400,800 中的 Evaluation 为：

{'eval_loss'： 0.6764485239982605， 'eval_accuracy'： 0.543918918918919， 'eval_f1'： 0.500248562558037， 'eval_matthews_correlation'： 0.1047497035448165， 'eval_precision'： 0.5646432374866879， 'eval_recall'： 0.5424348347148706， 'eval_runtime'： 3.1702， 'eval_samples_per_second'： 373.483， 'eval_steps_per_second'： 23.343， 'epoch'： 0.15}

{'eval_loss'： 0.6603909730911255， 'eval_accuracy'： 0.7170608108108109， 'eval_f1'： 0.7006777463594056， 'eval_matthews_correlation'： 0.4870947362226591， 'eval_precision'： 0.7747432713117492， 'eval_recall'： 0.7158936240030817， 'eval_runtime'： 3.0877， 'eval_samples_per_second'： 383.453， 'eval_steps_per_second'： 23.966， 'epoch'： 0.45}

{'eval_loss'： 0.3846745193004608， 'eval_accuracy'： 0.8386824324325， 'eval_f1'： 0.8381642045361026， 'eval_matthews_correlation'： 0.6827625873383905， 'eval_precision'： 0.8437887048419396， 'eval_recall'： 0.8389907406086374， 'eval_runtime'： 3.088， 'eval_samples_per_second'： 383.423， 'eval_steps_per_second'： 23.964， 'epoch'： 0.6}

使用默认参数，需要设置不同的 LR 吗？

Hello, has your issue been resolved? I'm not sure why the eval_loss keeps fluctuating around 0.69, and the eval_accuracy remains at 0.5.

{'loss': 0.6966, 'learning_rate': 2.9946319642130948e-05, 'epoch': 0.04}
0%|▎ | 100/24640 [00:25<1:38:50, 4.14it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6943994760513306, 'eval_accuracy': 0.5, 'eval_f1': 0.4395347531083056, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4283, 'eval_samples_per_second': 1045.035, 'eval_steps_per_second': 32.667, 'epoch': 0.04}
{'loss': 0.6952, 'learning_rate': 2.9824318828792195e-05, 'epoch': 0.08}
1%|▋ | 200/24640 [01:02<1:44:08, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6953960061073303, 'eval_accuracy': 0.5, 'eval_f1': 0.47613731222845423, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 13.3936, 'eval_samples_per_second': 969.717, 'eval_steps_per_second': 30.313, 'epoch': 0.08}
{'loss': 0.695, 'learning_rate': 2.9704758031720214e-05, 'epoch': 0.12}
1%|▉ | 300/24640 [01:40<1:35:35, 4.24it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6941527724266052, 'eval_accuracy': 0.5, 'eval_f1': 0.48044077783907113, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.8027, 'eval_samples_per_second': 1014.474, 'eval_steps_per_second': 31.712, 'epoch': 0.12}
{'loss': 0.6973, 'learning_rate': 2.9582757218381458e-05, 'epoch': 0.16}
2%|█▎ | 400/24640 [02:17<1:37:01, 4.16it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6983540654182434, 'eval_accuracy': 0.5, 'eval_f1': 0.47727597702653923, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4288, 'eval_samples_per_second': 1044.996, 'eval_steps_per_second': 32.666, 'epoch': 0.16}
{'loss': 0.6967, 'learning_rate': 2.9460756405042703e-05, 'epoch': 0.2}
2%|█▋ | 500/24640 [02:55<1:42:51, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.693518877029419, 'eval_accuracy': 0.5, 'eval_f1': 0.4317208432763909, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.349, 'eval_samples_per_second': 1051.749, 'eval_steps_per_second': 32.877, 'epoch': 0.2}
{'loss': 0.6949, 'learning_rate': 2.9338755591703947e-05, 'epoch': 0.24}
2%|█▉ | 600/24640 [03:32<1:44:46, 3.82it/s]***** Running Evaluation *****

Zhihan1996 · 2024-12-14T03:52:10Z

Which datasets are you using? It is possible that the model fails to converge with some random seeds and hyperparameters. We observe same phenomenon on the COVID dataset before.

zhujunru854 · 2025-01-16T12:25:06Z

My fine-tuning results with LoRa are similar to yours, but also very poor. I tried tweaking the lr and adding layers to lora, but the results didn't improve. I was using an extremely unbalanced CTCF binding site data and using the focal loss to calculate the loss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune results seems not very stable #125

finetune results seems not very stable #125

maris205 commented Oct 29, 2024 •

edited

Loading

2020guotao commented Nov 8, 2024

Zhihan1996 commented Dec 14, 2024

zhujunru854 commented Jan 16, 2025

finetune results seems not very stable #125

finetune results seems not very stable #125

Comments

maris205 commented Oct 29, 2024 • edited Loading

2020guotao commented Nov 8, 2024

Zhihan1996 commented Dec 14, 2024

zhujunru854 commented Jan 16, 2025

maris205 commented Oct 29, 2024 •

edited

Loading