New users are not able to join once training started #718
Labels
bug
Something isn't working
decentralized
For the decentralized setting
federated
For the federated setting
Milestone
Currently, participants need to join within a few seconds of each other, otherwise their contribution is dropped out (by the server in federated and by other peers in decentralized).
Federated
Decentralized
minReadyPeer
peers joined a task, the collaborative training starts and sometimes starts so quickly that there isn't enough time for other peers to join and contribute before the first round is already finished. For example ifminReadyPeer = 3
, as soon as 3 peers joined the network, they start aggregating their contributions. Even if a 4th peer joins right after the 3rd, its contribution may be dropped because the first 3 peers already passed to a new round. This should be fixed by the previous checkbox, enabling outdated peers to catch up to the latest round but the first round may still finish before every peer that wanted to join could contribute.A potential fix is to implement a waiting stage for peers to join a task and communicate their readiness to start training. Concretely, peers click on "Join task", if the training is already going on then they catch up on the current round. Otherwise, they get into the waiting room where they can see the current number of peers also waiting. Once there are more than
minReadyPeers
in the waiting room, they can press a button "Ready" to communicate that they would like training now. Once all peers pressed "ready", the training starts and future peers can join mid-training without waiting room.The text was updated successfully, but these errors were encountered: