Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline FedNova does not work with batchnorms #4345

Open
wittenator opened this issue Oct 21, 2024 · 0 comments
Open

Baseline FedNova does not work with batchnorms #4345

wittenator opened this issue Oct 21, 2024 · 0 comments
Labels
part: baselines Add or update baseline state: stale If issue/PR hasn't been updated within 3 weeks. type: bug Something isn't working

Comments

@wittenator
Copy link

Describe the bug

I am currently running the baselines of Flower with different models. I tried Fednova and it seems to work well with the standard config. Once I activate batchnorms in the VGG model, an error gets thrown:

File "/home/korjakow/.cache/pypoetry/virtualenvs/fednova-0w39Djqe-py3.10/lib/python3.10/site-packages/flwr/simulation/ray_transport/ray_client_proxy.py", line 196, in fit
    return maybe_call_fit(
  File "/home/korjakow/.cache/pypoetry/virtualenvs/fednova-0w39Djqe-py3.10/lib/python3.10/site-packages/flwr/client/client.py", line 217, in maybe_call_fit
    return client.fit(fit_ins)
  File "/home/korjakow/.cache/pypoetry/virtualenvs/fednova-0w39Djqe-py3.10/lib/python3.10/site-packages/flwr/client/app.py", line 333, in _fit
    results = self.numpy_client.fit(parameters, ins.config)  # type: ignore
  File "/home/korjakow/projects/flower-1/baselines/fednova/fednova/client.py", line 71, in fit
    self.set_parameters(parameters)
  File "/home/korjakow/projects/flower-1/baselines/fednova/fednova/client.py", line 65, in set_parameters
    self.optimizer.set_model_params(parameters)
  File "/home/korjakow/projects/flower-1/baselines/fednova/fednova/models.py", line 360, in set_model_params
    p.data.copy_(param_tensor)
RuntimeError: The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 3

I assume that the baseline wasn't executed with the batchnorm options and the current code does not handle the running stats of the batchnorm layer well.

Steps/Code to Reproduce

  1. Apply the fix from State dict creation bug for e.g. resnet18 #4344 since otherwise the code errors beforehands.
  2. Activate batchnorms in
    def make_layers(network_cfg, batch_norm=False):
  3. Run the baseline: python -m fednova.main

Expected Results

The training should progress.

Actual Results

An error gets thrown as shown above.

@wittenator wittenator added the type: bug Something isn't working label Oct 21, 2024
@WilliamLindskog WilliamLindskog added state: stale If issue/PR hasn't been updated within 3 weeks. part: baselines Add or update baseline labels Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part: baselines Add or update baseline state: stale If issue/PR hasn't been updated within 3 weeks. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants