[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

jeffaryhe · 2024-12-13T07:47:21Z

What happened?

prometheus-operator/kube-prometheus#2558 pls look this

Please provide any helpful snippets.

No response

What parts of the codebase are affected?

Rules

I agree to the following terms:

I agree to follow this project's Code of Conduct.
I have filled out all the required information above to the best of my ability.
I have searched the issues of this repository and believe that this is not a duplicate.
I have confirmed this bug exists in the default branch of the repository, as of the latest commit at the time of submission.

skl · 2024-12-16T17:34:13Z

Hi @jeffaryhe, thanks for the report. I had a look at the issue you raised:

Prometheus rule KubeletTooManyPods incorrect statistics prometheus-operator/kube-prometheus#2558

I also had a look at the KubeletTooManyPods alert rule:

kubernetes-mixin/alerts/kubelet.libsonnet

Lines 54 to 63 in bad0615

    
                       expr: ||| 
        
                         count by(%(clusterLabel)s, node) ( 
        
                           (kube_pod_status_phase{%(kubeStateMetricsSelector)s,phase="Running"} == 1) * on(instance,pod,namespace,%(clusterLabel)s) group_left(node) topk by(instance,pod,namespace,%(clusterLabel)s) (1, kube_pod_info{%(kubeStateMetricsSelector)s}) 
        
                         ) 
        
                         / 
        
                         max by(%(clusterLabel)s, node) ( 
        
                           kube_node_status_capacity{%(kubeStateMetricsSelector)s,resource="pods"} != 1 
        
                         ) > 0.95 
        
                       ||| % $._config, 
        
                       'for': '15m',

From what I can see, the alert fires if the count of running pods on a node is at >95% that of the pod limit for that node.

Can you help me understand which statistics you see as incorrect? For example, do you think part of the alert rule could be improved?

jeffaryhe · 2024-12-23T08:29:48Z

If a node supports 60 pods, 60 pods can be deployed, not counting the number of containers in the pod.

skl · 2024-12-30T14:40:20Z

Containers are not considered as part of the alert, only pods (kube_pod_status_phase and kube_pod_info metrics are pod-level, and kube_node_status_capacity{resource="pods"} provides the pod capacity per node).

I still do not understand the issue, perhaps you could reproduce your problem by creating a new unit test and show me how it fails? Here is the existing unit test for this rule:

kubernetes-mixin/tests.yaml

Lines 406 to 435 in 03c13f9

    
           - interval: 1m 
        
             input_series: 
        
             - series: 'kube_node_status_capacity{resource="pods",instance="172.17.0.5:8443",cluster="kubernetes",node="minikube",job="kube-state-metrics",namespace="kube-system"}' 
        
               values: '3+0x15' 
        
             - series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-1",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             - series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-1",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             - series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-2",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             - series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-2",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             - series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-3",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             - series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-3",service="kube-state-metrics"}' 
        
               values: '1+0x15' 
        
             alert_rule_test: 
        
             - eval_time: 10m 
        
               alertname: KubeletTooManyPods 
        
             - eval_time: 15m 
        
               alertname: KubeletTooManyPods 
        
               exp_alerts: 
        
               - exp_labels: 
        
                   cluster: kubernetes 
        
                   node: minikube 
        
                   severity: info 
        
                 exp_annotations: 
        
                   summary: "Kubelet is running at capacity." 
        
                   description: "Kubelet 'minikube' is running at 100% of its Pod capacity." 
        
                   runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods

aleskiontherun · 2025-01-07T10:51:26Z

In my case there are 2 instances of kube-state-metrics, so every metric is doubled with different instance lables, and the query returns all numbers doubled, e.g. for an instance with 19 running pods and capacity of 35 the alert fires with the number 1.0857142857142856. How would I de-duplicate it?

skl · 2025-01-07T11:15:16Z

@aleskiontherun thanks for the detail, as you have duplicate KSM on the instance label that would cause an issue with the first part of the rule, where your pod count will be doubled:

kubernetes-mixin/alerts/kubelet.libsonnet

Line 56 in 35aebca

    
                           (kube_pod_status_phase{%(kubeStateMetricsSelector)s,phase="Running"} == 1) * on(instance,pod,namespace,%(clusterLabel)s) group_left(node) topk by(instance,pod,namespace,%(clusterLabel)s) (1, kube_pod_info{%(kubeStateMetricsSelector)s})

Whereas the capacity is already de-duplicated, which is why you get >100%:

kubernetes-mixin/alerts/kubelet.libsonnet

Lines 59 to 61 in 35aebca

    
                         max by(%(clusterLabel)s, node) ( 
        
                           kube_node_status_capacity{%(kubeStateMetricsSelector)s,resource="pods"} != 1 
        
                         ) > 0.95

So the pod count part of the rule needs to be rewritten, say something like:

count by (%(clusterLabel)s, node) (
  (kube_pod_status_phase{%(kubeStateMetricsSelector)s, phase="Running"} == 1)
  * on (%(clusterLabel)s, namespace, pod) group_left (node)
  group by (%(clusterLabel)s, namespace, pod, node) (
    kube_pod_info{%(kubeStateMetricsSelector)s}
  )
)
/
max by (%(clusterLabel)s, node) (
  kube_node_status_capacity{%(kubeStateMetricsSelector)s, resource="pods"} != 1
) > 0.95

I can get a PR together for this unless you'd like to?

aleskiontherun · 2025-01-07T11:38:25Z

@skl I'd need to spend some time to fully understand what's going on in the query and how the mixin works (I'm simply installing it with kube-prometheus-stack), so I'd really appreciate if you could do it. One uneducated guess: wouldn't it make sense to just divide the result by the number of instances?

In my own queries I'm using kubelet_active_pods metric, which is not part of KMS, but allows to simplify the query quite a bit.

skl · 2025-01-07T12:02:58Z

Sure no problem, I'll get a PR up shortly 👍

skl · 2025-01-07T14:08:16Z

Done in #1011 @aleskiontherun 😄

skl self-assigned this Dec 16, 2024

skl added the question Further information is requested label Dec 17, 2024

skl mentioned this issue Jan 7, 2025

feat: de-dupe KubeletTooManyPods, add cluster to descriptions #1011

Merged

skl closed this as completed in #1011 Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

jeffaryhe commented Dec 13, 2024

skl commented Dec 16, 2024

jeffaryhe commented Dec 23, 2024

skl commented Dec 30, 2024

aleskiontherun commented Jan 7, 2025

skl commented Jan 7, 2025

aleskiontherun commented Jan 7, 2025

skl commented Jan 7, 2025

skl commented Jan 7, 2025

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

Comments

jeffaryhe commented Dec 13, 2024

What happened?

Please provide any helpful snippets.

What parts of the codebase are affected?

I agree to the following terms:

skl commented Dec 16, 2024

jeffaryhe commented Dec 23, 2024

skl commented Dec 30, 2024

aleskiontherun commented Jan 7, 2025

skl commented Jan 7, 2025

aleskiontherun commented Jan 7, 2025

skl commented Jan 7, 2025

skl commented Jan 7, 2025