-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
State dict creation bug for e.g. resnet18 #4344
Comments
Just to add that: I am talking about the fix that is mentioned in this comment: #3237 (comment) |
Changing the line |
Hi @wittenator, yes using ('num_batches_tracked', tensor([])) which isn't correct. But when using ('num_batches_tracked', tensor(0)) All this being said, this part of the code isn't part of "Flower" strictly speaking. Since, depending on your model (or even ML framework of choice) you'd implement this functionality in one or other way. Should we flag this issue as resolved? How about #3237 ? Note The recommended way of running Flower projects is via |
Ah, that's very interesting! I wasn't aware of the this intricate difference between |
People have encountered it a few times indeed. Let me loop in @danieljanes and @yan-gao-GY: should we change all instances of: def set_parameters(model, parameters):
params_dict = zip(model.state_dict().keys(), parameters)
state_dict = OrderedDict({k: torch.tensor(v) for k, v in params_dict}) # replace with torch.from_numpy()
model.load_state_dict(state_dict, strict=True) |
@wittenator Thanks for raising this issue. @jafermarq It makes sense for me to replace with |
@yan-gao-GY are there any consequences related to performance or something non-obvious when changing |
Describe the bug
Running the baselines with other models e.g. torchvision.models.resnet18 for fedprox/fednova/etc. fails with an out of bounds exception. This is the same problem that many people faced in e.g. #3237 when following the initial flower tutorial. Replacing the state dict fix across the whole code base seems to fix the problem, but I don't really see the reason why it works. Since the same problem appears in the very introductory tutorial, I would be really interested to discuss if implementing this across the code base is possible and what the exact reason/problem is that this change is fixing.
Steps/Code to Reproduce
Try following the tutorial at https://flower.ai/docs/framework/tutorial-series-get-started-with-flower-pytorch.html with resnet18 instead of the custom model.
Example code snippet from condensed Flower tutorial:
Expected Results
There should be no error when run with another model architecture.
Actual Results
The text was updated successfully, but these errors were encountered: