Description
- 4/28/25: Title changed from
Bug: Index Workers lose connection to RMQ while queue still full
- See this comment below for next steps...
================================================================
When doing a reindex-all, the indexing runs well until it starts to process the resource maps, which take a long time. At this point, the RMQ channels get closed in a timeframe determined by the consumer_timeout
(default 30 mins).
We're ACK-ing messages immediately, in an attempt to circumvent this issue, but we're still having problems, even with longer timeout settings. The problem is with messages being sent to indexers that are still working on the previous job - the next message cannot be delivered, and so times out.
For more details, see Metacat Issue 1932
Proposed fix is to use @jeanetteclark's solution from metadig engine: catch the resulting com.rabbitmq.client.AlreadyClosedException
, and re-open the connections.