Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 11.0 instead of 11.1 for TF 2.4 compatibility; also named model outputs for huggingface >= 3.0.0 #47

Merged
merged 8 commits into from
Jan 14, 2021

Conversation

jarednielsen
Copy link
Contributor

Change optimizer.loss_scale() to optimizer.loss_scale for TF 2.4. Fixes #44.

I ran pretraining on ALBERT for a few thousand steps with XLA enabled on TF 2.4 and observed the loss drop from 11 to 7.5 in 3k steps with global batch size 32. I am not able to replicate #45. It is possible that fixing the named model outputs has solved this issue.

@jarednielsen jarednielsen merged commit 6cb5700 into aws-samples:master Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BERT Pre-Training crashes with TF 2.4
1 participant