Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GRPC broadcast implementation #65

Open
tremblerz opened this issue Sep 1, 2024 · 0 comments
Open

Improve GRPC broadcast implementation #65

tremblerz opened this issue Sep 1, 2024 · 0 comments
Labels
communication enhancement New feature or request help wanted Extra attention is needed

Comments

@tremblerz
Copy link
Contributor

tremblerz commented Sep 1, 2024

  • Context - This observation is based on running a simulation of 120 nodes using GRPC for traditional_fl. The problem would not be as bad when each node is only interacting with ~10-20 nodes in any given round.

  • Issue - Right now the broadcast function is implemented by looping over a send function which is a unicast function. This makes broadcast effectively a serially executed function which reduces its effectiveness.

  • Solution - While this can be improved by making the send function multi-threaded, I believe a better approach would be to have nodes pull the model updates instead of the super-node pushing it to each node. Even with pull approach, multithreading would be needed to make sure the early nodes wait until the most fresh copy of model weights is available. Furthermore, the server may not respond to the request if too many nodes are already in the request queue so we will have to implement the retry logic. The retry logic is already implemented for register function in https://github.com/aidecentralized/sonar/blob/main/src/utils/communication/grpc/main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
communication enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant