You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously, whenever a client would call Start to rejoin a cluster (and
fix any potential split-brain issues), a new state message would be
generated and broadcast to the cluster.
This resulted in a singificant amount of gossip traffic as the cluster
size increased: a cluster with 800 nodes rejoining every 15 seconds
would result in around 640,000 state change messages being sent every 15
seconds (800 state changes sent to 800 nodes).
This logic only appeared neccessary due to a bug in correcting invalid
state messages about our own node: if we get any message at all about
our node which is newer than our local copy, only then should we
broadcast a new message.
Fixing this bug removed the need for this logic, which will help reduce
the amount of gossip generated when rejoining.
With this change, rejoining a node will only require one push/pull
request per node being joined, and will result in no new broadcasts if
all nodes are already up-to-date.
To help observe broadcast volume, a new
cluster_node_gossip_broadcasts_total metric has been added.
0 commit comments