Improve GRPC broadcast implementation #65

tremblerz · 2024-09-01T03:06:07Z

Context - This observation is based on running a simulation of 120 nodes using GRPC for traditional_fl. The problem would not be as bad when each node is only interacting with ~10-20 nodes in any given round.
Issue - Right now the broadcast function is implemented by looping over a send function which is a unicast function. This makes broadcast effectively a serially executed function which reduces its effectiveness.
Solution - While this can be improved by making the send function multi-threaded, I believe a better approach would be to have nodes pull the model updates instead of the super-node pushing it to each node. Even with pull approach, multithreading would be needed to make sure the early nodes wait until the most fresh copy of model weights is available. Furthermore, the server may not respond to the request if too many nodes are already in the request queue so we will have to implement the retry logic. The retry logic is already implemented for register function in https://github.com/aidecentralized/sonar/blob/main/src/utils/communication/grpc/main.py

The text was updated successfully, but these errors were encountered:

tremblerz added enhancement New feature or request help wanted Extra attention is needed communication labels Sep 1, 2024

rishi-s8 mentioned this issue Oct 18, 2024

List of refactoring and code improvement opportunities #114

Open

Provide feedback