You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Batch renormalization aims to address the (fundamental) problem with batchnorm that a model is trained with activations from a different distribution than inference is done on, because per-mini-batch statistics are used during training and (approximations of) population statistics are used during inference. In particular, it helps when
Mini-batch sizes are small;
Or, samples are not IID. (This isn't really a problem for our current models.)
This is most relevant on CPU codes where we often have a quite small local mini-batch size. (Batch renorm does not seem to recover the full generalization accuracy compared to using a larger mini-batch, but it does get closer.)
The text was updated successfully, but these errors were encountered:
Paper: https://arxiv.org/abs/1702.03275
Batch renormalization aims to address the (fundamental) problem with batchnorm that a model is trained with activations from a different distribution than inference is done on, because per-mini-batch statistics are used during training and (approximations of) population statistics are used during inference. In particular, it helps when
This is most relevant on CPU codes where we often have a quite small local mini-batch size. (Batch renorm does not seem to recover the full generalization accuracy compared to using a larger mini-batch, but it does get closer.)
The text was updated successfully, but these errors were encountered: