Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reinitialize #2

Open
philokey opened this issue Mar 25, 2019 · 6 comments
Open

Question about reinitialize #2

philokey opened this issue Mar 25, 2019 · 6 comments

Comments

@philokey
Copy link

philokey commented Mar 25, 2019

Hi, I feel a little confused about reinitialize. You have set W.data[mask[name]] to zero in pruning, however in reinitialize, you don't recover the corresponding weight to the dropped filter. In the paper, it said "we reinitialize the filters to be orthogonal to its value before being dropped". I think it is not much reasonable, can you give me the reason that why did you implement like this?
Thank you very much.

@zeng-hello-world
Copy link

zeng-hello-world commented Mar 25, 2019

I think the filters been pruned should restore previous value that before it been pruned, and keep other non-pruned filters their current value, then use null_space = self.qr_find_null(W2d.cpu().detach().numpy()) to find null_space orthogonal to both.

@philokey
Copy link
Author

By the way, https://github.com/siahuat0727/RePr/blob/master/main.py#L190 it seems that the operation doesn't mentioned in the paper.

@siahuat0727
Copy link
Owner

@philokey
That's my mistake, thanks a lot!
For the channels initialization part, I have tried to initialize it randomly or with all zeros, and the former
looks better. I also consulted the author for this part and will update if there is any result.

@zeng-hello-world
Copy link

zeng-hello-world commented Mar 25, 2019

Another question: what if all filters of a layer are pruned?(this could happen sometime)
This will lead to the data forward through this layer all come out with zeros...

@tiandunx
Copy link

I have 2 questions.
1st, the architecture of your vaniila network. If it's trained from scratch following the standard way, the network should be already overfitting and the test curve is much different from what was reported in the paper.
2nd, in reinitialization step. In your code, you wrote null_space = qr_null(W2d if drop_filters[name] is None else np.vstack((drop_filters[name], W2d))), where W2d is the original full filters except that part of its filters were pruned (if any) and the other were used as the filters of the sub-network. But np.vstack seems to be wrong as it already included the whole matrix.
3rd, thank you for your contribution.

@siahuat0727
Copy link
Owner

siahuat0727 commented May 1, 2019

@tiandunx
Hi, thanks for checking.

  1. I think I didn't get it. I also wonder what's the difference between my experiment settings and the paper's.

  2. If I understand correctly, it only affects the efficiency of the code since the rows with all zeros may not affect the result of null space:

$ cat test.py
import numpy as np
from utils import qr_null

whole = np.random.randn(3, 4)
whole[0] = 0
print(whole)
print(qr_null(whole))
print()

sub = whole[1:]
print(sub)
print(qr_null(sub))
print()

print(np.array_equal(qr_null(whole), qr_null(sub)))

$ python test.py
[[ 0.          0.          0.          0.        ]
 [-1.93893849  0.17246339  0.40822182  0.45453628]
 [ 0.59895742 -1.22694375  0.70782981 -0.37624858]]
[[ 0.22369862  0.21604992]
 [ 0.56086785 -0.17220338]
 [ 0.79668953  0.02931069]
 [ 0.02592232  0.96062964]]

[[-1.93893849  0.17246339  0.40822182  0.45453628]
 [ 0.59895742 -1.22694375  0.70782981 -0.37624858]]
[[ 0.22369862  0.21604992]
 [ 0.56086785 -0.17220338]
 [ 0.79668953  0.02931069]
 [ 0.02592232  0.96062964]]

True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants