Improve your state of the art by using best activation function and best meta optimizer #2

LifeIsStrange · 2020-05-30T09:37:12Z

You could increase GPT 3 accuracy by using Ranger, which combine state of the art optimizers + gradient centralization
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
You seem to be using the Adam optimizer. It has been succeeded by RAdam (rectified Adam). Ranger will bring you this improvment and a lot more synergistic others, for free.

Hortogonally, you would probably benefit from Mish too instead of the one you use (Relu ?) but should be tested after Ranger as it could regress accuracy (even if unlikely)
https://github.com/digantamisra98/Mish

minimaxir · 2020-06-01T16:08:58Z

At the level these models are trained at, using a specific optimizer/activation will not necessarily get you better results.

digantamisra98 · 2020-06-01T16:13:06Z

Additionally considering GPT-3 size I would suggest not to use any optimizer above SGD because of the computation levels. Same goes for Mish.

LifeIsStrange · 2020-06-01T18:37:11Z

@minimaxir it will not necessarily bring gains but it is still a low hanging fruit that should be tried.

LifeIsStrange · 2020-06-01T18:40:49Z

@digantamisra98 RAdam (not the full Ranger package) does not increase computational cost.

I've read somewhere that Mish can be as efficient as Relu
Maybe with https://github.com/thomasbrandon/mish-cuda?

digantamisra98 · 2020-06-01T19:20:18Z

@LifeIsStrange everything above SGD is expensive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve your state of the art by using best activation function and best meta optimizer #2

Improve your state of the art by using best activation function and best meta optimizer #2

LifeIsStrange commented May 30, 2020 •

edited

Loading

minimaxir commented Jun 1, 2020

digantamisra98 commented Jun 1, 2020

LifeIsStrange commented Jun 1, 2020

LifeIsStrange commented Jun 1, 2020 •

edited

Loading

digantamisra98 commented Jun 1, 2020

Improve your state of the art by using best activation function and best meta optimizer #2

Improve your state of the art by using best activation function and best meta optimizer #2

Comments

LifeIsStrange commented May 30, 2020 • edited Loading

minimaxir commented Jun 1, 2020

digantamisra98 commented Jun 1, 2020

LifeIsStrange commented Jun 1, 2020

LifeIsStrange commented Jun 1, 2020 • edited Loading

digantamisra98 commented Jun 1, 2020

LifeIsStrange commented May 30, 2020 •

edited

Loading

LifeIsStrange commented Jun 1, 2020 •

edited

Loading