On Chapter 13 page 426-427 Potential Bug on Weight Initialization #219
liereynaldo
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
On Chapter 13 page 426-427
Here is snippet of the code:
class NoisyLinear(nn.Module):
def init(self, input_size, output_size, noise_stddev=0.1):
super().init()
w = torch.Tensor(input_size, output_size)
self.w = nn.Parameter(w) # nn.Parameter is a Tensor that's a module parameter.
nn.init.xavier_uniform_(self.w)
b = torch.Tensor(output_size).fill_(0)
self.b = nn.Parameter(b)
self.noise_stddev = noise_stddev
I think w is not initialized correctly since according to Pytorch documentation, the dimension of w for xavier initialization should be (output_size, input_size) not (input_size, output_size)
This is the explanation from Pytorch documentation:
Be aware that fan_in and fan_out are calculated assuming that the weight matrix is used in a transposed manner, (i.e., x @ w.T in Linear layers, where w.shape = [fan_out, fan_in]). This is important for correct initialization. If you plan to use x @ w, where w.shape = [fan_in, fan_out], pass in a transposed weight matrix, i.e. nn.init.xavier_uniform_(w.T, ...).
Beta Was this translation helpful? Give feedback.
All reactions