mimir Monolithic - 2 VM cluster - err-mimir-bucket-index-too-old #3841

elangela · 2023-01-02T17:54:37Z

elangela
Jan 2, 2023

Describe the bug

The compacter only update the bucket index in one of the two VMs in the cluster. This can been seen since bucket-index.json.gz is only updated in one of the two VMs. After one hour queries towards the non-active bucket index will return err-mimir-bucket-index-too-old. We want to be able to query the same data from both VMs, so that the load is shared.

To Reproduce

Setup up a cluster with two VMs utilizing memberlist
Configure compactor as below, or use default value(memberlist):
compactor:
sharding_ring:
kvstore:
store: memberlist

Expected behavior

You will not be able to query data older than 12h from the VM where the bucket list is NOT updated. Timestamp on ./bucket/bucket-index.json.gz will only be updated on one of the two VMs.

If you change to:
kvstore:
store: inmemory

Both indexes are updated

Environment

vmware, RHEL 8.7

Additional Context

#Disable the requirement that every request to Mimir has a
#X-Scope-OrgID header. `anonymous` will be substituted in instead.
multitenancy_enabled: false

server:
  http_listen_port: 9009
  ##info, warn, error
  log_level: error

#Configure the server to allow messages up to 100MB.
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  grpc_server_max_concurrent_streams: 1000

  http_tls_config:
    cert_file: /etc/ssl/mimir.crt
    key_file:  /etc/ssl/mimir.key

memberlist:
  join_members: [dbt1,dbt2]

distributor:
  ring:
    kvstore:
      store: memberlist
  pool:
    health_check_ingesters: true
  ha_tracker:
    enable_ha_tracker: false

ingester:
  ring:
#We want to start immediately and flush on shutdown.
    min_ready_duration: 0s
    final_sleep: 0s
    num_tokens: 512

    kvstore:
      store: memberlist
    replication_factor: 2
    instance_availability_zone: dbt1
    zone_awareness_enabled: true

blocks_storage:
  backend: filesystem
  tsdb:
    dir: /mimir/db/tsdb
  bucket_store:
    sync_dir: /mimir/db/tsdb-sync
    bucket_index:
      enabled: true
  filesystem:
    dir: /mimir/db/blocks

store_gateway:
  sharding_ring:
    zone_awareness_enabled: true
    instance_availability_zone: dbt1
    replication_factor: 2

compactor:
  sharding_ring:
    kvstore:
      store: memberlist
  data_dir: /mimir/db/compactor

usage_stats:
  enabled: false

common:
  storage:
    filesystem:
      dir: /mimir/db/common

limits:
  compactor_blocks_retention_period: 4d
  ingestion_rate: 300000
  ingestion_burst_size: 1800000
  max_global_series_per_user: 0
  out_of_order_time_window: 5m
  max_label_names_per_series: 40

Answered by pracucci

Jan 3, 2023

👋 Hi! I see you're using the filesystem backend storage:

blocks_storage:
  backend: filesystem

Mimir requires a shared storage among all Mimir replicas. The filesystem could be suitable only if it's a shared filesystem among replicas and not a local filesystem without sharing files between different replicas. We strongly recommend using an object storage, either provided by a cloud provider (e.g. AWS S3) or Minio when running on-premise.

View full answer

pracucci · 2023-01-03T12:54:50Z

pracucci
Jan 3, 2023
Maintainer

👋 Hi! I see you're using the filesystem backend storage:

blocks_storage:
  backend: filesystem

Mimir requires a shared storage among all Mimir replicas. The filesystem could be suitable only if it's a shared filesystem among replicas and not a local filesystem without sharing files between different replicas. We strongly recommend using an object storage, either provided by a cloud provider (e.g. AWS S3) or Minio when running on-premise.

1 reply

marvinnitz18 Feb 20, 2024

@pracucci i am using an object storage as the backend i get the same behavior i get err-mimir-bucket-index-too-old, what could be the cause of it not being updated properly by one of the instances of mimir ?

elangela · 2023-01-03T19:44:06Z

elangela
Jan 3, 2023
Author

Thank you soo much for your support on this issue. You are right. We have mounted one storage for each server, but I will now change this to a shared one. I have not see this specified in the documentation, but perhaps it is just implied, Unfortunately an object storage is not an option for us at the moment.

1 reply

pracucci Jan 4, 2023
Maintainer

Can you help us improving the documentation, please?

elangela · 2023-01-05T07:41:00Z

elangela
Jan 5, 2023
Author

At the moment I am not looking for new assignments, but perhaps sometime in the future :)

We mounted a shared nfs volume yesterday and retried, but the the results where disappointing. We got a lot of log entries as shown below, and mimir restarts. It almost seems like the two VM instances are arguing about linux file-handle. For us it seems like single instances is the way to go as the cluster dosent seem to work well with backend: filesystem. We just need find a way to distribute the traffic on the prometheus side.

hostname.something.com mimir[293552]: level=warn ts=2023-01-04T10:18:26.549748412Z caller=logging.go:86 traceID=xxxxxxxxxxxxxxxx msg="POST /api/v1/push (500) 1.238682ms Response: "failed pushing to ingester: rpc error: code = Unknown desc = user=prom-user: write to WAL: log samples: write /mimir/db/tsdb/promuser/wal/00000014: stale NFS file handle\n" ws: false; Content-Encoding: snappy; Content-Length: 5120; Content-Type: application/x-protobuf; User-Agent: Prometheus/2.32.1; X-Prometheus-Remote-Write-Version: 0.1.0; X-Scope-Orgid: promuser; "

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mimir Monolithic - 2 VM cluster - err-mimir-bucket-index-too-old #3841

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

mimir Monolithic - 2 VM cluster - err-mimir-bucket-index-too-old #3841

elangela Jan 2, 2023

Describe the bug

To Reproduce

Expected behavior

Environment

Additional Context

Replies: 3 comments · 2 replies

pracucci Jan 3, 2023 Maintainer

marvinnitz18 Feb 20, 2024

elangela Jan 3, 2023 Author

pracucci Jan 4, 2023 Maintainer

elangela Jan 5, 2023 Author

elangela
Jan 2, 2023

Replies: 3 comments 2 replies

pracucci
Jan 3, 2023
Maintainer

elangela
Jan 3, 2023
Author

pracucci Jan 4, 2023
Maintainer

elangela
Jan 5, 2023
Author