Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault_core_unsealed metric disappears after prometheus_retention_time elapses #24741

Open
cascadia-sati opened this issue Jan 9, 2024 · 2 comments

Comments

@cascadia-sati
Copy link

Describe the bug
I'm using the Vault Helm chart to deploy Vault to our K8s cluster using a three-replica HA setup. While working on monitoring for the seal status, I noticed that the vault_core_unsealed metric mysteriously disappears after some time. This discussion hinted that it may be due to the prometheus_retention_time config field. That does indeed seem to be the case.

Increasing it does make the metric stick around longer, but only until the retention time elapses. When the seal status does change, the metric comes back, and alerting can kick in, but in the meantime our dashboards and alerts show "no data", which is not ideal.

To Reproduce
Simply install Vault and watch the vault_core_unsealed metric disappear after prometheus_retention_time amount of time.

Expected behavior
The vault_core_unsealed metric should persist, even if it doesn't change. I'm assuming Vault needs to be modified to publish the metric periodically instead of only when it's changed.

Environment:

  • Vault Server Version (retrieve with vault status): v1.14.0
  • Vault CLI Version (retrieve with vault version): v1.14.0
  • Server Operating System/Architecture: Vault running on Kubernetes EKS installed via v0.25.0 of the Vault Helm chart

Vault server configuration file(s):
(My Helm values file with the Vault config embedded)

global:
  serverTelemetry:
    prometheusOperator: true
injector:
  enabled: false
server:
  ha:
    enabled: true
    replicas: 3
    # Enable HA for integrated storage
    raft:
      enabled: true
      setNodeId: true
      config: |
        # Setting the cluster name here avoids duplicate metrics:
        # https://github.com/hashicorp/vault/issues/11988
        cluster_name = "pace-vault"

        ui = true

        listener "tcp" {
          tls_disable = 1
          address = "[::]:8200"
          cluster_address = "[::]:8201"

          # Enable unauthenticated metrics access for Prometheus Operator
          telemetry {
            unauthenticated_metrics_access = "true"
          }
        }

        telemetry {
          prometheus_retention_time = "30m"
          disable_hostname = true
        }

        storage "raft" {
          path = "/vault/data"
        }

        # For integrated raft storage and security
        # https://developer.hashicorp.com/vault/docs/configuration#disable_mlock
        disable_mlock = true

        service_registration "kubernetes" {}
  serverTelemetry:
    serviceMonitor:
      enabled: true
  dataStorage:
    enabled: true
    size: 5Gi
    storageClass: ebs-gp3
  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: {{ template "vault.name" . }}
              app.kubernetes.io/instance: "{{ .Release.Name }}"
              component: server
          topologyKey: topology.kubernetes.io/zone

Additional context
Add any other context about the problem here.

@cascadia-sati
Copy link
Author

Along with hashicorp/vault-helm#990, this has made monitoring the seal status very difficult in our HA Vault setup on K8s. I'm still somewhat new to DevOps, so I was surprised and somewhat disappointed to find this not well baked. Surely we're not the only ones who find it important to monitor the seal status. If there's some better way to do this, or I'm missing something, please point me in the right direction.

@cascadia-sati
Copy link
Author

cascadia-sati commented Jan 11, 2024

This seems to only happen when the metric goes to "0" for "sealed. It then disappears after the retention time. However, when the metric is "1" for "unsealed", it persists even after the retention time.

The image below is an example with a retention time of five minutes. See the metrics disappear after that time when Vault seals. When it's unsealed, the metric persists.

Bildschirmfoto 2024-01-11 um 10 55 22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant