Latency in publish increases after node restart #14542
Replies: 2 comments 7 replies
-
The most obvious explanation would be a change in connection-queue/queue leader locality. By default we use the client-local strategy, so if the application declares a classic queue, it'll be hosted by the same node where the connection is. If it declares a quorum queue, its leader will be where the connection is. When you restart the server, a quorum queue leader location will likely change. A classic queue won't move, but the connection will, so either way - there'll be an additional network hop between the queue and the connection. If you can consistently fix this by restarting the app, I guess your app declares a new queue on startup, so it's local again. If that's not the case, then I'm not sure why an app restart would change anything, except by a lucky coincidence (this time it connects to where the queue/leader is). |
Beta Was this translation helpful? Give feedback.
-
As already mentioned by @kura, a node restart likely changes the queue leader distribution across cluster nodes. Given a constant number of resources available to this node, that can affect latency. An extra added hop . We cannot suggest much else with a three sentence long description without any workload/reproduction details or metrics, including multiple advanced metrics exposed via Prometheus and Grafana. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Community Support Policy
RabbitMQ version used
4.1.2
Erlang version used
27.3.x
Operating system (distribution) used
Ubuntu
How is RabbitMQ deployed?
Debian package
rabbitmq-diagnostics status output
See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 2 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 3 (if applicable, with sensitive values edited out)
See https://www.rabbitmq.com/docs/logging to learn how to collect logs
rabbitmq.conf
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ cluster
We deploy via ansible playbook
Steps to reproduce the behavior in question
As outlined below, during OS updates and applications reconnect to another node in the cluster
advanced.config
See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application code
# PASTE CODE HERE, BETWEEN BACKTICKS
Kubernetes deployment file
What problem are you trying to solve?
We are running rabbit cluster(3 nodes) on ec2 instances. We do OS updates on theseonce per month.
Process is one at a time, put in to maintenace, do updates(generally there is a reboot).
We have auto recovering so if an application is connected to node 1 it reconnects to another node.
What we are seen is the latency in publish increases and it will remain this way until we restart the applications.
Once restarted the latency is good again.
Any idea what the issue is here?
Beta Was this translation helpful? Give feedback.
All reactions