On Chapter 13 page 426-427 Potential Bug on Weight Initialization #219

liereynaldo · 2025-03-28T19:33:09Z

liereynaldo
Mar 28, 2025

On Chapter 13 page 426-427

Here is snippet of the code:

class NoisyLinear(nn.Module):
def init(self, input_size, output_size, noise_stddev=0.1):
super().init()
w = torch.Tensor(input_size, output_size)
self.w = nn.Parameter(w) # nn.Parameter is a Tensor that's a module parameter.
nn.init.xavier_uniform_(self.w)
b = torch.Tensor(output_size).fill_(0)
self.b = nn.Parameter(b)
self.noise_stddev = noise_stddev

I think w is not initialized correctly since according to Pytorch documentation, the dimension of w for xavier initialization should be (output_size, input_size) not (input_size, output_size)

This is the explanation from Pytorch documentation:

Be aware that fan_in and fan_out are calculated assuming that the weight matrix is used in a transposed manner, (i.e., x @ w.T in Linear layers, where w.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to use x @ w, where w.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e. nn.init.xavier_uniform_(w.T, ...).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On Chapter 13 page 426-427 Potential Bug on Weight Initialization #219

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

On Chapter 13 page 426-427 Potential Bug on Weight Initialization #219

Uh oh!

Uh oh!

liereynaldo Mar 28, 2025

Replies: 0 comments

liereynaldo
Mar 28, 2025