This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

When using a transformer, try learning-rate warm up and/or layer norm inside the residual blocks #43

Open

JackKelly opened this issue Jul 4, 2021 · 0 comments

Labels

machine learning model

Projects

Member

JackKelly commented Jul 4, 2021

https://twitter.com/sytelus/status/1411607820542218245

JackKelly created this issue from a note in ML research (To do)

JackKelly added the machine learning model label

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.