Skip to content

Group normalization documentation is incorrect #2143

Open
@albertz

Description

@albertz

Describe the bug

This is purely about the documentation.

In the documentation about group normalization, it is stated:

Relation to Layer Normalization: If the number of groups is set to 1, then this operation becomes identical to Layer Normalization.

However, that is not true.

Assume an input tensor x of shape [B,T,F] (batch, time, feature-dim) (time could also be H/W instead; feature-dim can also be the channels).

In layer normalization, the mean you calculate is:

mean = reduce_mean(x, axis=-1, keepdims=True)  # shape [B,T,1]

You normalize just over the feature axis.

In group normalization with G=1 (ignore the group shape then), the mean you calculate is:

mean = reduce_mean(x, axis=[1,2], keepdims=True)  # shape [B,1,1]

You normalize over all axes except the batch axis and the newly added group axis (doesn't matter if G=1).

Or do I misunderstand sth? I wonder because the same wrong statement is in the original group-normalization paper.

The figure from the paper (also here) is also misleading:

In this figure, it looks like layer-normalization normalizes over H/W as well. But this is not the case (at least commonly, and also with the default options).
So, this figure is wrong about layer-normalization (it would just normalize over C, not H/W).
But the figure is correct for group-normalization as you have implemented it (it normalizes over all axes except N/G).

I also formulated the question here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions