question on stage 2 learning rate #11

DanqingZ · 2021-05-08T20:29:13Z

Hi thanks for the work! I have some question on some implementations for stage 2
https://github.com/cliang1453/BOND/blob/master/run_self_training_ner.py#L204-L215

From the code, I can see stage 1 and stage 2 share the same scheduler, which means the learning rate for stage 2 is very small. Is this designed deliberately? The alternative is that I first train a baseline teacher's model, and pass the model to stage 2. And stage 2 can have its own learning rate scheduler then.

I am asking because I think learning rate is very important to BERT model training. Thanks.

DanqingZ changed the title ~~question on stage 2~~ question on stage 2 learning rate May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on stage 2 learning rate #11

question on stage 2 learning rate #11

DanqingZ commented May 8, 2021 •

edited

Loading

question on stage 2 learning rate #11

question on stage 2 learning rate #11

Comments

DanqingZ commented May 8, 2021 • edited Loading

DanqingZ commented May 8, 2021 •

edited

Loading