Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Telemetry metrics labels #11939

Open
avinovarov opened this issue Dec 17, 2024 · 0 comments
Open

Internal Telemetry metrics labels #11939

avinovarov opened this issue Dec 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@avinovarov
Copy link

Describe the bug
When OTel Collector is run as a standalone pod (not sidecar), k8s-related labels like pod and namespace names are missing in otelcol_* metrics of internal telemetry.

Steps to reproduce
We have OTel Collector deployed in two scenarios:
In sidecar scenario, we have OpenTelemetry Collector run as a sidecar container in the app pod, and export the metrics to the 'central' OTel Collector.
In standalone scenario, we have the 'central' OpenTelemetry Collector running as a standalone pod, receiving the metrics from the app sidecar collectors from multiple namespaces.

What did you expect to see?
In sidecar scenario the metrics do get labels like k8s_namespace_name and k8s_pod_name, which we need to specify the metrics in Grafana dashboards.

What did you see instead?
Out of the box, in standalone scenario no k8s-related labels are being added.

In sidecar scenario, the otel metrics labels we get look like this, containing k8s_pod_name and k8s_namespace_name:

otelcol_exporter_sent_metric_points_total{
cluster="dev-workloads", 
environment="dev", 
exporter="otlp", 
instance="<id>", 
job="otelcol-contrib", 
k8s_namespace_name="app-main", 
k8s_node_name="aks-mainpool...", 
k8s_pod_name="<app_pod_name>", 
k8s_pod_uid="<uid>", 
service_instance_id="<id>", 
service_name="otelcol-contrib", 
service_version="0.115.0"
}

In standalone scenario, the otelcol_* metrics labels we get are as follows:

otelcol_exporter_sent_metric_points_total{
cluster="dev-workloads", 
environment="dev", 
exporter="otlp/central", 
instance="<id>", 
job="otelcol-central", 
service_instance_id="<id>", 
service_name="otelcol-central", 
service_version="0.115.0"
}

What version did you use?
We run OTel Collectors in our k8s clusters, installed as Helm charts.
Chart version: opentelemetry-operator:0.74.3
OTel image version override: 0.115.1 (tried on 0.114.0 too)

What config did you use?
In sidecar scenario, we have OpenTelemetry Collector run as a sidecar container in the app pod, and export the metrics to the 'central' OTel collector. Simplified config looks like this:

  config:
    receivers:
      otlp/unix:
        protocols:
          grpc:
            transport: unix
            endpoint: "@otlp.sock"
    exporters:
      otlp:
        endpoint: workloads-collector.opentelemetry.svc.cluster.local:4317 # metrics received from app are being sent to 'central' collector
        tls:
          insecure: true
    processors:
      k8sattributes:
        passthrough: false
        extract:
          metadata:
            - k8s.deployment.name
            - k8s.pod.start_time
          labels:
            - tag_name: component
              key: component
              from: pod
            - tag_name: app
              key: app
              from: pod
            - tag_name: environment
              key: env
              from: pod
        pod_association:
          - sources:
            - from: resource_attribute
              name: k8s.pod.ip
          - sources:
            - from: resource_attribute
              name: k8s.pod.uid
          - sources:
            - from: connection
    service:
      telemetry:
        metrics:
          level: detailed
          readers:
            - periodic:
                interval: 10000
                exporter:
                  otlp:
                    protocol: grpc/protobuf
                    endpoint: workloads-collector.opentelemetry.svc.cluster.local:4317 # collector metrics are sent to 'central' collector too
                    #endpoint: unix:otlp.sock # specifying unix socket does not seem to work due to enforced string validation, so the sidecar collector isn't able to send metrics to itself
      pipelines:
        metrics:
          receivers: [otlp/unix]
          processors: [k8sattributes]
          exporters: [otlp]

In standalone scenario, we have the 'central' OpenTelemetry Collector running as a standalone central collector, receiving the metrics from the app sidecars from multiple namespaces.
Simplifed config looks like this:

  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    exporters:
      ...
    processors:
      attributes/env:
        actions:
          - action: insert
            key: environment
            value: "dev"
          - action: insert
            key: cluster
            value: "dev-cluster"
      resource/env:
        attributes:
          - action: insert
            key: environment
            value: "dev"
          - action: insert
            key: cluster
            value: "dev-cluster"
    service:
      telemetry:
        metrics:
          level: detailed
          readers:
            - periodic:
                interval: 10000
                exporter:
                  otlp:
                    protocol: grpc/protobuf
                    endpoint: http://localhost:4317 # 'central' collector sends its own metrics to itself
      pipelines:
        metrics:
          receivers:
            - otlp
          processors:
            - attributes/env
            - resource/env
            - batch
          exporters:
            - ...

Environment
All workloads are run in Azure k8s/AKS v1.30

Additional context
An additional question on this topic, what is the suggested way of debugging the (re)labeling in OTel Collectors? Something like the step-by-step relabeling details similar to what Prometheus UI provides maybe?

@avinovarov avinovarov added the bug Something isn't working label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant