Flappy replica election in HA setup #12398

fradsj · 2025-08-14T12:29:02Z

fradsj
Aug 14, 2025

Hello.

My team and I have configured the Mimir distributor's HA tracker to deduplicate metrics from two instances of Prometheus.

Occasionally, both Prometheus instances report a failure when sending metrics to Mimir, with the following log:

Mimir distributor's config:

        distributor:
          ha_tracker:
            enable_ha_tracker: true
            kvstore:
              etcd:
                endpoints:
                - etcd.mimir.svc.cluster.local:2379
                tls_ca_path: /etc/etcd/certs/ca.crt
                tls_cert_path: /etc/etcd/certs/tls.crt
                tls_enabled: true
                tls_key_path: /etc/etcd/certs/tls.key
              store: etcd

Prometheus config:

global:
  scrape_interval: 30s
  external_labels:
    __replica__: prometheus-kube-prometheus-stack-prometheus-0
    cluster: owkin-k-dev-core
    prometheus: monitoring/kube-prometheus-stack-prometheus
  evaluation_interval: 30s

I have checked that the __replica__ and cluster labels are being sent to Mimir, everything seems to be correct:

I have also checked that the error logs are related to the change of one instance in one cluster on the HA tracker status page.

However, I can't see any Prometheus restarts. None of the instances seem to be experiencing networking issues either.

I'm struggling to understand exactly how one Prometheus instance is elected to distribute its metrics to Mimir.

I have looked to this part of the distributor's code, but I'm not sure to look at the right place.

My questions:

do you have any schematics summarising the Prometheus election process ?
what could be causing the Prometheus instances to fail in the HA Tracker status?

Thanks a lot for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flappy replica election in HA setup #12398

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Flappy replica election in HA setup #12398

Uh oh!

fradsj Aug 14, 2025

Replies: 0 comments

fradsj
Aug 14, 2025