-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT Pre-Training loss does not reduce when upgrading to TF 2.4 #45
Comments
…outputs for huggingface >= 3.0.0 (#47) Change optimizer.loss_scale() to optimizer.loss_scale for TF 2.4. Fixes #44. I ran pretraining on ALBERT for a few thousand steps with XLA enabled on TF 2.4 and observed the loss drop from 11 to 7.5 in 3k steps with global batch size 32. I am not able to replicate #45. It is possible that fixing the named model outputs has solved this issue.
I'm not able to replicate these results - I saw the loss drop to 7.5 within 2k steps. Can you try again with the latest version of the codebase? I've updated the Dockerfile to be TF 2.4 with the latest version of |
@rondogency Can you share the env file on which we saw this issue ? |
@jarednielsen Did you try to reproduce before the |
After the |
When training BERT with TF 2.3, the loss would decrease and
MLM_Acc
would be non-zero.After upgrading to TF 2.4 and using the same script, the loss does not decrease and
MLM_Acc
remains 0.0Note : The hyperparameters were unchanged between the runs with TF2.3 and TF 2.4.
Here are the logs of a 2 node run :
The text was updated successfully, but these errors were encountered: