-
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
3.12.4 is out of community support, you need to upgrade to 4.2. That said the graph may just be a side effect of how metrics are calculated. You need to look at the throughout rate of your consumers and see if it matches what you see in the management UI. Upgrade to 4.2 and see if it still occurs. |
Beta Was this translation helpful? Give feedback.
-
@ponponon - what do you expect the RabbitMQ maintainers to do with what little information you provide, exactly? Do you expect them to rush to set up an environment, try to GUESS how you're using RabbitMQ, and report back to you, all for free? You're not even using a supported version of RabbitMQ. If you want to get free support for your issue, I suggest you provide enough information to reproduce what you report. First, reproduce your issue in your environment using the latest version of RabbitMQ and Erlang. If you see the same behavior, provide a git repository with the complete source code to start producers and consumers that mimics your workload and reproduces what you observe. |
Beta Was this translation helpful? Give feedback.
-
@ponponon do you expect us to guess what your consumers do or do not do (like do not acknowledge deliveries in a timely manner or use a suitable prefetch value)? I'm afraid our small team cannot afford guessing, guessing is a very very time consuming approach to troubleshooting distributed infrastructure.
The Erlang runtime does not suffer from "stop the world" pauses caused by GC because there is no global GC, every Erlang process (a connection, a channel or session, or queue or stream replica) has an independent heap and their garbage collections do not affect other processes. Yes, there is a shared reference counted heap for larger binaries but its GC is not "stop the world" for the entire system. As any heavy PerfTest user would confirm, when a stop-the-world Java GC in a consumer or producer process happens, you can usually tell by a drop in publishing or delivery/delivery acknowledgement metrics, even though RabbitMQ was not paused for GC. One scenario where RabbitMQ is guaranteed to stop deliveries is when a consumer is delivered as many messages as its channel's prefetch, which by definition means that RabbitMQ should not deliver any more until some outstanding deliveries are acknowledged. |
Beta Was this translation helpful? Give feedback.
-
By using monitoring data, ideally with a full set of Grafana dashboards (it can be inter-node connection congestion if the messages are large), and by asking the node how does it spend its CPU/scheduler time. If this node has 1 CPU core, then a surge of activity in any part of the system (e.g. on a particular connection) inevitably can take CPU scheduler time from queues or channels (that serialize deliveries to be sent). With an installation so old (it has reached EOL without any exceptions), I cannot rule out that these periodic background GC settings that were relevant for some workloads years ago could be enabled. They force a minor GC run for every single process in the system. |
Beta Was this translation helpful? Give feedback.
3.12.4 is out of community support, you need to upgrade to 4.2.
That said the graph may just be a side effect of how metrics are calculated. You need to look at the throughout rate of your consumers and see if it matches what you see in the management UI. Upgrade to 4.2 and see if it still occurs.