You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the code, I can see stage 1 and stage 2 share the same scheduler, which means the learning rate for stage 2 is very small. Is this designed deliberately? The alternative is that I first train a baseline teacher's model, and pass the model to stage 2. And stage 2 can have its own learning rate scheduler then.
I am asking because I think learning rate is very important to BERT model training. Thanks.
The text was updated successfully, but these errors were encountered:
DanqingZ
changed the title
question on stage 2
question on stage 2 learning rate
May 8, 2021
Hi thanks for the work! I have some question on some implementations for stage 2
https://github.com/cliang1453/BOND/blob/master/run_self_training_ner.py#L204-L215
From the code, I can see stage 1 and stage 2 share the same scheduler, which means the learning rate for stage 2 is very small. Is this designed deliberately? The alternative is that I first train a baseline teacher's model, and pass the model to stage 2. And stage 2 can have its own learning rate scheduler then.
I am asking because I think learning rate is very important to BERT model training. Thanks.
The text was updated successfully, but these errors were encountered: