When does quorum queue membership reconciliation kick in? #11634

shadizar128 · 2024-07-08T13:33:15Z

shadizar128
Jul 8, 2024

Describe the bug

It seems that the functionality described in Reconcile (repair or expand) quorum queue membership periodically #8218 does not work, quorum queues do not replicate on new nodes when one member is offline

Reproduction steps

Setup a cluster with 4 nodes
Enable and configure the feature on all nodes

> cat /etc/rabbitmq/rabbitmq.config
[
    { kernel,
        [
            { inet_dist_listen_min, 6150 },
            { inet_dist_listen_max, 6150 }
        ]
    },
    { rabbit,
        [
            { tcp_listeners, [ { '127.0.0.1', 5672 }, { '::1', 5672 } ] },
            { cluster_nodes, { [ 'rabbit@c8e80d9a297fb155f530ec54b6689be1','rabbit@1580f619f8cb44a59c0341655a565709','rabbit@8fe935da8fbc79e3c824c39ed88cb471','rabbit@63e223e2c64520bff5bcad815c2ecc19' ], disc } },
            { cluster_partition_handling, autoheal },
            { quorum_membership_reconciliation_enabled, true },
            { quorum_membership_reconciliation_auto_remove, true },
            { quorum_membership_reconciliation_interval, 1 },
            { quorum_membership_reconciliation_trigger_interval, 1 },
            { quorum_membership_reconciliation_target_group_size, 3 }
        ]
    }
].

* Note that I tried with and without the interval settings
3. Set logs on debug
4. Restart all nodes

> service rabbitmq-server restart

Create a quorum queue with the following arguments

x-queue-type: quorum
x-quorum-target-group-size: 3
durable: true

Inspect the queue members and shut down one of them (not the leader), in my case it was

> service rabbitmq-service stop

* In my case I shutdown rabbit@8fe935da8fbc79e3c824c39ed88cb471
7. Inspect the logs an notice that the reconciliation mechanism was triggered due to node down

2024-07-05 14:16:08.261957+00:00 [error] <0.331.0> ** Node rabbit@8fe935da8fbc79e3c824c39ed88cb471 not responding **
2024-07-05 14:16:08.261957+00:00 [error] <0.331.0> ** Removing (timedout) connection **
2024-07-05 14:16:08.261957+00:00 [error] <0.331.0>
2024-07-05 14:16:08.262552+00:00 [info] <0.3365.0> rabbit on node rabbit@8fe935da8fbc79e3c824c39ed88cb471 down
2024-07-05 14:16:08.303766+00:00 [info] <0.3555.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.320682+00:00 [debug] <0.3636.0> Quorum Queue membership reconciliation triggered: {node_down,
2024-07-05 14:16:08.320682+00:00 [debug] <0.3636.0>                                                    rabbit@8fe935da8fbc79e3c824c39ed88cb471}
2024-07-05 14:16:08.326297+00:00 [info] <0.3591.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328581+00:00 [info] <0.3522.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328582+00:00 [info] <0.3579.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328625+00:00 [info] <0.3559.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328880+00:00 [info] <0.3596.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328922+00:00 [info] <0.3503.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.328912+00:00 [info] <0.3613.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.402118+00:00 [info] <0.3530.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.405769+00:00 [info] <0.3575.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.409485+00:00 [info] <0.3542.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.409629+00:00 [info] <0.3518.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.411404+00:00 [info] <0.3621.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:08.411546+00:00 [info] <0.3551.0> Mirrored queue 'REDACTED' in vhost '/': Secondary replica of queue <[email protected]> detected replica  <[email protected]> to be down
2024-07-05 14:16:11.347492+00:00 [info] <0.3365.0> node rabbit@8fe935da8fbc79e3c824c39ed88cb471 down: net_tick_timeout

Wait 60 seconds and look at the queue again, notice that 2 out of 3 members are online and that a new member is not added

Expected behavior

Rabbitmq should replicate the queue on the 4th member

Additional context

I tried several other things

configuring group size via queue arguments vs policy (regular and operator)
waiting for 1 hour (since in Reconcile (repair or expand) quorum queue membership periodically #8218 a longer wait interval is mentioned)
close the victim node abruptly (kill host VM) instead of gracefully since I thought that maybe a graceful shutdown was considered maintenance or something
getting queue properties with rabbitmqctl since the UI seems a bit outdated when it comes to quorum queues

The only thing that actually worked was to remove (forget) the offline member and then restart all the other 3 nodes.

Answered by michaelklishin

Jul 8, 2024

@shadizar128 if you ship RabbitMQ as part of a product, how exactly you manage QQ and stream replicas is up to you. It is your problem to solve, not ours. Specifically if you are not a regular contributor or a paying customer.

QQ reconciliation was contributed by a very large scale user which removes nodes which are considered unavailable (what conditions and criteria they use, I don't know), so this mechanism will kick in. They never expected it to kick in during a rolling upgrade because right now, this is how they upgrade clusters. But that will change and perhaps more options will be introduced.

In any case, replica management in QQ and streams is explicit by design. Adding new replic…

View full answer

lukebakken · 2024-07-08T13:41:42Z

lukebakken
Jul 8, 2024
Maintainer

What version of RabbitMQ are you using?
Do you observe this behavior with an ODD number of nodes? We do NOT recommend using an even number of nodes.

4 replies

shadizar128 Jul 8, 2024
Author

Was about to modify the issue when it was changed to a discussion
RabbitMQ 3.13.3, Erlang 26.2.5

Initially I tried with 3 nodes and set the group target to 2 nodes, same thing happened. Actually when I tried with 3 nodes the group target size seemed to be ignored, even when I set it to 2 (in settings and in arguments/policy) the queue was always replicated to 3 nodes, that is why I changed to 4 nodes.

shadizar128 Jul 8, 2024
Author

We are currently using mirrored queues but since we updated to 3.13 we saw multiple warnings that future updates may be impossible unless we switch to quorum queues. Our solution is on-premise, it is installed by each customer in their own infrastructure so we cannot oversee all of them and change quorum queues members manually whenever bad things happen.

lukebakken Jul 8, 2024
Maintainer

when it was changed to a discussion

Yes, Team RabbitMQ starts with discussions FIRST until it is clearly demonstrated that an issue falls under our community support guidelines and is clearly a bug.

It would be helpful for you to re-try your scenario with five nodes and report back.

cc @michaelklishin @kjnilsson

shadizar128 Jul 8, 2024
Author

when it was changed to a discussion

Yes, Team RabbitMQ starts with discussions FIRST until it is clearly demonstrated that an issue falls under our community support guidelines and is clearly a bug.

It would be helpful for you to re-try your scenario with five nodes and report back.

cc @michaelklishin @kjnilsson

Ok, will try with 5 nodes

kjnilsson · 2024-07-08T13:59:00Z

kjnilsson
Jul 8, 2024
Maintainer

Unless you remove the down RabbitMQ node (with rabbitmqctl forget_cluster_node) reconciliation doesn't do anything.

1 reply

shadizar128 Jul 8, 2024
Author

Even with this option ?

{ quorum_membership_reconciliation_auto_remove, true },

michaelklishin · 2024-07-08T14:01:06Z

michaelklishin
Jul 8, 2024
Maintainer

@shadizar128 this feature was designed replace replicas that were hosted on permanently removed replicas.

RabbitMQ cannot know if a node is down for just a few minutes for an upgrade or it is never coming back. You must explicitly remove nodes (or QQ replicas) and then the reconciliation mechanism will notice that new ones should be added. Otherwise you'd end up with a replica on every node very quickly, after a single rolling upgrade.

3 replies

shadizar128 Jul 8, 2024
Author

Hmm, is this the behavior for mirrored queues as well?

michaelklishin Jul 8, 2024
Maintainer

Mirrored classic queues were removed from RabbitMQ. Do not use them regardless of what the answer is.

shadizar128 Jul 8, 2024
Author

We cannot "not use them" as long as quorum queues do not bring the same level of self managed HA. We tried using QQ a year ago but this lack was a deal breaker.

michaelklishin · 2024-07-08T14:50:25Z

michaelklishin
Jul 8, 2024
Maintainer

@shadizar128 if you ship RabbitMQ as part of a product, how exactly you manage QQ and stream replicas is up to you. It is your problem to solve, not ours. Specifically if you are not a regular contributor or a paying customer.

QQ reconciliation was contributed by a very large scale user which removes nodes which are considered unavailable (what conditions and criteria they use, I don't know), so this mechanism will kick in. They never expected it to kick in during a rolling upgrade because right now, this is how they upgrade clusters. But that will change and perhaps more options will be introduced.

In any case, replica management in QQ and streams is explicit by design. Adding new replicas when a cluster member stops is a terrible idea, as I said above. It would mean that you'd (potentially) have a new replica added every time a node is stopped for an upgrade.

1 reply

shadizar128 Jul 8, 2024
Author

Okay, thank you for your answers.

michaelklishin · 2024-07-08T14:56:01Z

michaelklishin
Jul 8, 2024
Maintainer

Finally, these settings

{ quorum_membership_reconciliation_enabled, true },
{ quorum_membership_reconciliation_auto_remove, true },
{ quorum_membership_reconciliation_interval, 1 },
{ quorum_membership_reconciliation_trigger_interval, 1 },
{ quorum_membership_reconciliation_target_group_size, 3 }

use a crazy low interval. Running this expensive — even in 3.13.4, where it was significantly optimized — operation every second is a complete waste of your cluster's resources.

Just like considering a client unavailable too early is not going to work well in practice, a reconciliation interval of less than 15, maybe even 30 seconds is guaranteed to have side effects (resource burn but maybe even likely untested and hard to reason about scenarios such as overlapping reconciliation operations) and won't yield any practical benefit.

9 replies

mkuratczyk Apr 29, 2025
Maintainer

@iv-anas Assuming you're on a supported RabbitMQ version, create an issue with sufficient details to help you (starting with versions and configuration file).

kjnilsson Apr 29, 2025
Maintainer

check the number of online members,

The documentation isn't quite correct. It does not check the number of online members, it checks the number of configured members. And if the number of configured members is smaller than the target it will grow.

In your case shutting down a node isn't enough. You also need to issue the forget_node command for the quorum queue configuration to be updated.

iv-anas Apr 29, 2025

yes, then it is working fine.

In https://www.rabbitmq.com/docs/quorum-queues#replica-reconciliation
When activated, every quorum queue leader replica will periodically check its current membership group size (the number of replicas online), and compare it with the target value - is misleading.

kjnilsson Apr 29, 2025
Maintainer

Yes it is, as I pointed out, not correct. We'll update docs accordingly.

michaelklishin Apr 29, 2025
Maintainer

rabbitmq/rabbitmq-website#2253

michaelklishin · 2025-04-30T03:21:11Z

michaelklishin
Apr 30, 2025
Maintainer

Hopefully rabbitmq/rabbitmq-website#2253 settles it, so we can close this discussion.

0 replies

When does quorum queue membership reconciliation kick in? #11634

Uh oh!

shadizar128 Jul 8, 2024

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 6 comments · 18 replies

Uh oh!

lukebakken Jul 8, 2024 Maintainer

Uh oh!

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

lukebakken Jul 8, 2024 Maintainer

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

kjnilsson Jul 8, 2024 Maintainer

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

michaelklishin Jul 8, 2024 Maintainer

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

michaelklishin Jul 8, 2024 Maintainer

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

Uh oh!

michaelklishin Jul 8, 2024 Maintainer

Uh oh!

shadizar128 Jul 8, 2024 Author

Uh oh!

michaelklishin Jul 8, 2024 Maintainer

Uh oh!

mkuratczyk Apr 29, 2025 Maintainer

Uh oh!

Uh oh!

kjnilsson Apr 29, 2025 Maintainer

Uh oh!

iv-anas Apr 29, 2025

Uh oh!

kjnilsson Apr 29, 2025 Maintainer

Uh oh!

michaelklishin Apr 29, 2025 Maintainer

Uh oh!

michaelklishin Apr 30, 2025 Maintainer

shadizar128
Jul 8, 2024

Replies: 6 comments 18 replies

lukebakken
Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

shadizar128 Jul 8, 2024
Author

lukebakken Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

kjnilsson
Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

michaelklishin
Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

michaelklishin Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

michaelklishin
Jul 8, 2024
Maintainer

shadizar128 Jul 8, 2024
Author

michaelklishin
Jul 8, 2024
Maintainer

mkuratczyk Apr 29, 2025
Maintainer

kjnilsson Apr 29, 2025
Maintainer

kjnilsson Apr 29, 2025
Maintainer

michaelklishin Apr 29, 2025
Maintainer

michaelklishin
Apr 30, 2025
Maintainer