Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with vault metrics via prometheus after pod restart #27872

Open
alek-devoluton opened this issue Jul 26, 2024 · 1 comment
Open

Issue with vault metrics via prometheus after pod restart #27872

alek-devoluton opened this issue Jul 26, 2024 · 1 comment
Labels
bug Used to indicate a potential bug core/metric waiting-for-response

Comments

@alek-devoluton
Copy link

alek-devoluton commented Jul 26, 2024

We have the latest vault installed in our k8s cluster via helm, and it is activated to in prometheus so we can scrape the metrics and get data from vault status in grafana. Usually it works fine. But every time we deploy a new upgrade for vault or prometheus and the pods need to be restarted. The endpoint seems to disappear. In prometheus dashboard we get the error:

///v1/sys/metrics?format=prometheus": unsupported protocol scheme with state down.

The issue gets kinda resolved after multiple restart of prometheus-server and vault pods. But that doesn't seem like a long term solution.
Vault configuration:

`server:
image:
tag: 1.16.2
annotations:
iam.amazonaws.com/role: arn:aws:iam::{{ .Values.aws.account }}:role/vault-{{ .Values.environmentName }}-oidc-role
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::{{ .Values.aws.account }}:role/vault-{{ .Values.environmentName }}-oidc-role
volumes:
- name: node-cert
secret:
secretName: vault-node-cert
volumeMounts:
- mountPath: /etc/certs
name: node-cert
readOnly: true
ha:
enabled: true
replicas: 3
apiAddr: vault.{{ .Values.dnsSubdomain }}:8200
config: |
ui = true

  listener "tcp" {
    tls_cert_file      = "/etc/certs/tls.crt"
    tls_key_file       = "/etc/certs/tls.key"
    tls_client_ca_file = "/etc/certs/ca.crt"
    address = "[::]:8200"
    cluster_address = "[::]:8201"
    telemetry {
      unauthenticated_metrics_access = "false"
    }
  }

  seal "awskms" {
    region     = "{{ .Values.aws.region }}"
    kms_key_id = "{{ .Values.aws.kms_key_id }}"
  }

  storage "dynamodb" {
    ha_enabled = "true"
    region     = "{{ .Values.aws.region }}"
    table      = "{{ .Values.environmentName }}-apps-vault-data"
  }

  telemetry {
    disable_hostname = true
    prometheus_retention_time = "12h"
  }

  service_registration "kubernetes" {
    namespace      = "vault"
    pod_name       = "vault"
  }

service:
enabled: true
port: 8200
targetPort: 8200

ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-dns
nginx.ingress.kubernetes.io/backend-protocol: https
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
ingressClassName: nginx-internal
pathType: Prefix
activeService: false
hosts:
- host: vault.{{ .Values.dnsSubdomain }}
paths:
- /
tls:
- secretName: vault-tls
hosts:
- vault.{{ .Values.dnsSubdomain }} `

Scrape config:

VAULT
  - job_name: {{ $.Values.clusterName }}-vault-exporter
    metrics_path: /v1/sys/metrics
    params:
      format: ['prometheus']
    scheme: https
    tls_config:
      insecure_skip_verify: true
    authorization:
      credentials: ref+vault://{{ $.Values.secretsPath }}/{{ $.Values.clusterName }}?address={{ $.Values.vaultEndpoint }}#vault-token
    static_configs:
    - targets: ['vault.vault:8200']
  {{- end }}
  
  We used the vault monitoring official documentation: https://developer.hashicorp.com/vault/tutorials/archive/monitor-telemetry-grafana-prometheus

Any suggestions, or something we are missing?

Thank you in advanced!

@heatherezell heatherezell added bug Used to indicate a potential bug core/metric labels Jul 26, 2024
@biazmoreira
Copy link
Contributor

Hi,

Does the scrape config work the first time you deploy without the protocol in the static_config?

static_configs:
    - targets: ['vault.vault:8200']

Have you tried adding the protocol before the target's value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/metric waiting-for-response
Projects
None yet
Development

No branches or pull requests

3 participants