Support batch renormalization #112

ndryden · 2017-09-17T16:57:14Z

Batch renormalization aims to address the (fundamental) problem with batchnorm that a model is trained with activations from a different distribution than inference is done on, because per-mini-batch statistics are used during training and (approximations of) population statistics are used during inference. In particular, it helps when

Mini-batch sizes are small;
Or, samples are not IID. (This isn't really a problem for our current models.)

This is most relevant on CPU codes where we often have a quite small local mini-batch size. (Batch renorm does not seem to recover the full generalization accuracy compared to using a larger mini-batch, but it does get closer.)

ndryden added the enhancement label Sep 17, 2017

oyamay pushed a commit to oyamay/lbann that referenced this issue Jun 16, 2020

Refactoring distconv (LBANN#112)

31dab20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support batch renormalization #112

Support batch renormalization #112

ndryden commented Sep 17, 2017

Support batch renormalization #112

Support batch renormalization #112

Comments

ndryden commented Sep 17, 2017