Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace BatchNorm with LayerNorm #1035

Open
2 tasks
bkmi opened this issue Mar 19, 2024 · 2 comments
Open
2 tasks

replace BatchNorm with LayerNorm #1035

bkmi opened this issue Mar 19, 2024 · 2 comments
Assignees
Labels
API changes This impacts the public API of the project (e.g. inference class). enhancement New feature or request good first issue Good for newcomers hackathon

Comments

@bkmi
Copy link
Contributor

bkmi commented Mar 19, 2024

Is your feature request related to a problem? Please describe.
BatchNorm violates the iid (independent and identically distributed) assumption (see here, which is fundamental to many of the objective functions in this library. This can potentially cause issues when training models that rely on this assumption.

Describe the solution you'd like
To address this, we propose replacing BatchNorm with LayerNorm or GroupNorm wherever feasible. These alternatives do not interfere with the iid assumption and are better suited to the requirements of our library.

This needs to adapt all the "factories" (e.g. here:

  • Update all net builders where this applies here
  • Think about other nets in here e.g. embedding nets which would like this feature too.

Describe alternatives you've considered
One alternative could be to allow users to choose between BatchNorm and LayerNorm through a configurable flag. However, this seems like a niche use case, and BatchNorm’s reliance on iid assumptions doesn’t offer a compelling reason to keep it, aside from legacy compatibility.

Additional context
Many classifiers provide an option like use_batch_norm: bool = False to control BatchNorm usage. We suggest updating this option to use_layer_norm to align with the new approach.

@bkmi bkmi added enhancement New feature or request good first issue Good for newcomers API changes This impacts the public API of the project (e.g. inference class). hackathon labels Mar 19, 2024
@bkmi bkmi changed the title replace BatchNorm for LayerNorm replace BatchNorm with LayerNorm Mar 19, 2024
@bkmi bkmi self-assigned this Mar 21, 2024
@bkmi
Copy link
Contributor Author

bkmi commented Mar 21, 2024

@janfb wanted to see a side-by-side comparison of the two with LayerNorm and BatchNorm before making the change.

@francois-rozet
Copy link

francois-rozet commented Mar 22, 2024

To comment on that issue, with BatchNorm one should be very careful to always feed batches that represent "modes" in the same proportion at the original distribution. For example, for a binary classifier, it is invalid to evaluate the positive and negative batches separately during training as it therefore becomes enough to identify the class of a single element in the batch to decide the class of the entire batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API changes This impacts the public API of the project (e.g. inference class). enhancement New feature or request good first issue Good for newcomers hackathon
Projects
None yet
Development

No branches or pull requests

2 participants