You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently implemented a large scale improved gossip protocol based on memberlist. memberlist did not meet our expectations when the number of nodes reached 1000. After investigation, it is found that the broadcast mechanism takes data from the system broadcast queue first(function getBraodcasts in broadcast.go). If the number of nodes is too large, the UDP packet does not have enough space to store the data in the user-defined broadcast queue, which leads to our failure to achieve system consistency.
The text was updated successfully, but these errors were encountered:
No, I conducted experiments based on memberlist in a cluster consisting of thousands of servers. The real cause of the problem I mentioned is that the udp packet size of a single broadcast is limited, and memberlist's broadcast logic is to first populate the system data, that is, some node status information, and then fetch data from the user-defined broadcast queue to populate the udp packet. This logic is fine in a small cluster, but if the cluster scale increases, udp packets will be full of node status information, resulting in inconsistency at the user level.
Hi, I'm not the author of memberlist, but I also test memberlist on a ~4500 node cluster and I want to say that you probably use memberlist in a wrong way. It's not a good idea to use its "user messages" at large scale. You should mostly use it for host discovery, not as a network transport...
However, memberlist seems to have other problems: #311#312
We recently implemented a large scale improved gossip protocol based on memberlist. memberlist did not meet our expectations when the number of nodes reached 1000. After investigation, it is found that the broadcast mechanism takes data from the system broadcast queue first(function
getBraodcasts
in broadcast.go). If the number of nodes is too large, the UDP packet does not have enough space to store the data in the user-defined broadcast queue, which leads to our failure to achieve system consistency.The text was updated successfully, but these errors were encountered: