-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost Matrix Computation in Weight Matching #4
Comments
Hi @frallebini! The writeup in the paper is for the special case of an MLP with no bias terms -- the version in the code is just more general. The connection here is that there's a sum over all weight arrays that interact with that |
Thanks @samuela, I understand that the code is a generalization of the MLP with no bias case, but still:
|
Ack, you're right! I messed up: it's not actually a Frobenius inner product, just a regular matrix product. The moveaxis-reshape combo is necessary to flatten dimensions that we don't care about in the case of non-2d weight arrays.
Yup, that's exactly what |
Ok, but let us consider the MLP-with-no bias case. The way the paper models weight matching as an LAP is In other words, it computes What the code does, instead—if I understood correctly—is computing
In other words I don't think (1) and (2) are the same thing though. |
Hmm I think the error here is in the first line of (2): The shapes here don't line up since I think tracing out the code for the MLP without bias terms case is a good idea. In that case we run through the
|
Ok, the role of On the other hand, the
Right? |
That's correct! In addition, it's necessary when dealing weight arrays of higher shapes as well, eg in a convolutional layer where the weights have shape |
Hi, I read the code and I really did not understand the following snippet. Because It relates to the weight matching algorithm, so I post here.
According to the above line, if W_\ell has shape [m, n] (m is output feature dim, n is input feature dim) in the Dense layer, then the shape of the permutation matrix P_\ell will be [n, n]. But when I read the paper, I think it should be [m, m]. Sorry for the silly question, but might you explain? @samuela @frallebini Thank you! |
Hi @LeCongThuong,
Therefore, |
Thank you so much for replying @samuela! I tried to understand
From that axes[0][1] will be 1, thus the shape of P_l will be [n, n]. Thank you again for replying to my question. |
Hi, I read the paper and I am having a really hard time reconciling the formula
with the actual computation of the cost matrix for the LAP in
weight_matching.py
, namelyAre you following a different mathematical derivation or am I missing something?
The text was updated successfully, but these errors were encountered: