mimir Monolithic - 2 VM cluster - err-mimir-bucket-index-too-old #3841
-
Describe the bugThe compacter only update the bucket index in one of the two VMs in the cluster. This can been seen since bucket-index.json.gz is only updated in one of the two VMs. After one hour queries towards the non-active bucket index will return err-mimir-bucket-index-too-old. We want to be able to query the same data from both VMs, so that the load is shared. To Reproduce
Expected behaviorYou will not be able to query data older than 12h from the VM where the bucket list is NOT updated. Timestamp on ./bucket/bucket-index.json.gz will only be updated on one of the two VMs. If you change to: Both indexes are updated Environmentvmware, RHEL 8.7 Additional Context
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
👋 Hi! I see you're using the
Mimir requires a shared storage among all Mimir replicas. The |
Beta Was this translation helpful? Give feedback.
-
Thank you soo much for your support on this issue. You are right. We have mounted one storage for each server, but I will now change this to a shared one. I have not see this specified in the documentation, but perhaps it is just implied, Unfortunately an object storage is not an option for us at the moment. |
Beta Was this translation helpful? Give feedback.
-
At the moment I am not looking for new assignments, but perhaps sometime in the future :) We mounted a shared nfs volume yesterday and retried, but the the results where disappointing. We got a lot of log entries as shown below, and mimir restarts. It almost seems like the two VM instances are arguing about linux file-handle. For us it seems like single instances is the way to go as the cluster dosent seem to work well with backend: filesystem. We just need find a way to distribute the traffic on the prometheus side. hostname.something.com mimir[293552]: level=warn ts=2023-01-04T10:18:26.549748412Z caller=logging.go:86 traceID=xxxxxxxxxxxxxxxx msg="POST /api/v1/push (500) 1.238682ms Response: "failed pushing to ingester: rpc error: code = Unknown desc = user=prom-user: write to WAL: log samples: write /mimir/db/tsdb/promuser/wal/00000014: stale NFS file handle\n" ws: false; Content-Encoding: snappy; Content-Length: 5120; Content-Type: application/x-protobuf; User-Agent: Prometheus/2.32.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: promuser; " |
Beta Was this translation helpful? Give feedback.
👋 Hi! I see you're using the
filesystem
backend storage:Mimir requires a shared storage among all Mimir replicas. The
filesystem
could be suitable only if it's a shared filesystem among replicas and not a local filesystem without sharing files between different replicas. We strongly recommend using an object storage, either provided by a cloud provider (e.g. AWS S3) or Minio when running on-premise.