Flappy replica election in HA setup #12398
Unanswered
fradsj
asked this question in
Help and support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello.
My team and I have configured the Mimir distributor's HA tracker to deduplicate metrics from two instances of Prometheus.
Occasionally, both Prometheus instances report a failure when sending metrics to Mimir, with the following log:

Mimir distributor's config:
Prometheus config:
I have checked that the

__replica__
andcluster
labels are being sent to Mimir, everything seems to be correct:I have also checked that the error logs are related to the change of one instance in one cluster on the HA tracker status page.

However, I can't see any Prometheus restarts. None of the instances seem to be experiencing networking issues either.
I'm struggling to understand exactly how one Prometheus instance is elected to distribute its metrics to Mimir.
I have looked to this part of the distributor's code, but I'm not sure to look at the right place.
My questions:
Thanks a lot for your help.
Beta Was this translation helpful? Give feedback.
All reactions