Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune results seems not very stable #125

Open
maris205 opened this issue Oct 29, 2024 · 3 comments
Open

finetune results seems not very stable #125

maris205 opened this issue Oct 29, 2024 · 3 comments

Comments

@maris205
Copy link

maris205 commented Oct 29, 2024

For the first run, the Evaluation in step 200,400,800:

{'eval_loss': 0.6967583894729614, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1287, 'eval_samples_per_second': 378.436, 'eval_steps_per_second': 23.652, 'epoch': 0.15}

{'eval_loss': 0.6942448019981384, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0649, 'eval_samples_per_second': 386.309, 'eval_steps_per_second': 24.144, 'epoch': 0.3}

{'eval_loss': 0.6936447620391846, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918918919, 'eval_recall': 0.5, 'eval_runtime': 3.1124, 'eval_samples_per_second': 380.411, 'eval_steps_per_second': 23.776, 'epoch': 0.9}

In 3000 steps:
{'eval_loss': 0.6934958696365356, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810810811, 'eval_recall': 0.5, 'eval_runtime': 3.0942, 'eval_samples_per_second': 382.652, 'eval_steps_per_second': 23.916, 'epoch': 2.4}


Then run again,the Evaluation in step 200,400,800:

{'eval_loss': 0.6764485239982605, 'eval_accuracy': 0.543918918918919, 'eval_f1': 0.500248562558037, 'eval_matthews_correlation': 0.1047497035448165, 'eval_precision': 0.5646432374866879, 'eval_recall': 0.5424348347148706, 'eval_runtime': 3.1702, 'eval_samples_per_second': 373.483, 'eval_steps_per_second': 23.343, 'epoch': 0.15}

{'eval_loss': 0.6603909730911255, 'eval_accuracy': 0.7170608108108109, 'eval_f1': 0.7006777463594056, 'eval_matthews_correlation': 0.4870947362226591, 'eval_precision': 0.7747432713117492, 'eval_recall': 0.7158936240030817, 'eval_runtime': 3.0877, 'eval_samples_per_second': 383.453, 'eval_steps_per_second': 23.966, 'epoch': 0.45}

{'eval_loss': 0.3846745193004608, 'eval_accuracy': 0.8386824324324325, 'eval_f1': 0.8381642045361026, 'eval_matthews_correlation': 0.6827625873383905, 'eval_precision': 0.8437887048419396, 'eval_recall': 0.8389907406086374, 'eval_runtime': 3.088, 'eval_samples_per_second': 383.423, 'eval_steps_per_second': 23.964, 'epoch': 0.6}

use the default parameter,need I set a different LR?

@2020guotao
Copy link

对于第一次运行,步骤 200,400,800 中的 Evaluation (评估):

{'eval_loss': 0.6967583894729614, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918919, 'eval_recall': 0.5, 'eval_runtime': 3.1287, 'eval_samples_per_second': 378.436, 'eval_steps_per_second': 23.652, 'epoch': 0.15}

{'eval_loss': 0.6942448019981384, 'eval_accuracy': 0.49746621621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810811, 'eval_recall': 0.5, 'eval_runtime': 3.0649, 'eval_samples_per_second': 386.309, 'eval_steps_per_second': 24.144, 'epoch': 0.3}

{'eval_loss': 0.6936447620391846, 'eval_accuracy': 0.5025337837837838, 'eval_f1': 0.3344575604272063, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.2512668918919, 'eval_recall': 0.5, 'eval_runtime': 3.1124, 'eval_samples_per_second': 380.411, 'eval_steps_per_second': 23.776, 'epoch': 0.9}

以 3000 步为单位: {'eval_loss': 0.6934958696365356, 'eval_accuracy': 0.49746621621623, 'eval_f1': 0.332205301748449, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.24873310810811, 'eval_recall': 0.5, 'eval_runtime': 3.0942, 'eval_samples_per_second': 382.652, 'eval_steps_per_second': 23.916, 'epoch': 2.4}

然后再次运行,步骤 200,400,800 中的 Evaluation 为:

{'eval_loss': 0.6764485239982605, 'eval_accuracy': 0.543918918918919, 'eval_f1': 0.500248562558037, 'eval_matthews_correlation': 0.1047497035448165, 'eval_precision': 0.5646432374866879, 'eval_recall': 0.5424348347148706, 'eval_runtime': 3.1702, 'eval_samples_per_second': 373.483, 'eval_steps_per_second': 23.343, 'epoch': 0.15}

{'eval_loss': 0.6603909730911255, 'eval_accuracy': 0.7170608108108109, 'eval_f1': 0.7006777463594056, 'eval_matthews_correlation': 0.4870947362226591, 'eval_precision': 0.7747432713117492, 'eval_recall': 0.7158936240030817, 'eval_runtime': 3.0877, 'eval_samples_per_second': 383.453, 'eval_steps_per_second': 23.966, 'epoch': 0.45}

{'eval_loss': 0.3846745193004608, 'eval_accuracy': 0.8386824324325, 'eval_f1': 0.8381642045361026, 'eval_matthews_correlation': 0.6827625873383905, 'eval_precision': 0.8437887048419396, 'eval_recall': 0.8389907406086374, 'eval_runtime': 3.088, 'eval_samples_per_second': 383.423, 'eval_steps_per_second': 23.964, 'epoch': 0.6}

使用默认参数,需要设置不同的 LR 吗?

Hello, has your issue been resolved? I'm not sure why the eval_loss keeps fluctuating around 0.69, and the eval_accuracy remains at 0.5.

{'loss': 0.6966, 'learning_rate': 2.9946319642130948e-05, 'epoch': 0.04}
0%|▎ | 100/24640 [00:25<1:38:50, 4.14it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6943994760513306, 'eval_accuracy': 0.5, 'eval_f1': 0.4395347531083056, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4283, 'eval_samples_per_second': 1045.035, 'eval_steps_per_second': 32.667, 'epoch': 0.04}
{'loss': 0.6952, 'learning_rate': 2.9824318828792195e-05, 'epoch': 0.08}
1%|▋ | 200/24640 [01:02<1:44:08, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6953960061073303, 'eval_accuracy': 0.5, 'eval_f1': 0.47613731222845423, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 13.3936, 'eval_samples_per_second': 969.717, 'eval_steps_per_second': 30.313, 'epoch': 0.08}
{'loss': 0.695, 'learning_rate': 2.9704758031720214e-05, 'epoch': 0.12}
1%|▉ | 300/24640 [01:40<1:35:35, 4.24it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6941527724266052, 'eval_accuracy': 0.5, 'eval_f1': 0.48044077783907113, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.8027, 'eval_samples_per_second': 1014.474, 'eval_steps_per_second': 31.712, 'epoch': 0.12}
{'loss': 0.6973, 'learning_rate': 2.9582757218381458e-05, 'epoch': 0.16}
2%|█▎ | 400/24640 [02:17<1:37:01, 4.16it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.6983540654182434, 'eval_accuracy': 0.5, 'eval_f1': 0.47727597702653923, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.4288, 'eval_samples_per_second': 1044.996, 'eval_steps_per_second': 32.666, 'epoch': 0.16}
{'loss': 0.6967, 'learning_rate': 2.9460756405042703e-05, 'epoch': 0.2}
2%|█▋ | 500/24640 [02:55<1:42:51, 3.91it/s]***** Running Evaluation *****
Num examples = 12988
Batch size = 8
{'eval_loss': 0.693518877029419, 'eval_accuracy': 0.5, 'eval_f1': 0.4317208432763909, 'eval_matthews_correlation': 0.0, 'eval_precision': 0.5, 'eval_recall': 0.5, 'eval_runtime': 12.349, 'eval_samples_per_second': 1051.749, 'eval_steps_per_second': 32.877, 'epoch': 0.2}
{'loss': 0.6949, 'learning_rate': 2.9338755591703947e-05, 'epoch': 0.24}
2%|█▉ | 600/24640 [03:32<1:44:46, 3.82it/s]***** Running Evaluation *****

@Zhihan1996
Copy link
Collaborator

Which datasets are you using? It is possible that the model fails to converge with some random seeds and hyperparameters. We observe same phenomenon on the COVID dataset before.

@zhujunru854
Copy link

My fine-tuning results with LoRa are similar to yours, but also very poor. I tried tweaking the lr and adding layers to lora, but the results didn't improve. I was using an extremely unbalanced CTCF binding site data and using the focal loss to calculate the loss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants