-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997
Comments
Hi @jeffaryhe, thanks for the report. I had a look at the issue you raised: I also had a look at the KubeletTooManyPods alert rule: kubernetes-mixin/alerts/kubelet.libsonnet Lines 54 to 63 in bad0615
From what I can see, the alert fires if the count of running pods on a node is at >95% that of the pod limit for that node. Can you help me understand which statistics you see as incorrect? For example, do you think part of the alert rule could be improved? |
If a node supports 60 pods, 60 pods can be deployed, not counting the number of containers in the pod. |
Containers are not considered as part of the alert, only pods ( I still do not understand the issue, perhaps you could reproduce your problem by creating a new unit test and show me how it fails? Here is the existing unit test for this rule: Lines 406 to 435 in 03c13f9
|
In my case there are 2 instances of |
@aleskiontherun thanks for the detail, as you have duplicate KSM on the kubernetes-mixin/alerts/kubelet.libsonnet Line 56 in 35aebca
Whereas the capacity is already de-duplicated, which is why you get >100%: kubernetes-mixin/alerts/kubelet.libsonnet Lines 59 to 61 in 35aebca
So the pod count part of the rule needs to be rewritten, say something like:
I can get a PR together for this unless you'd like to? |
@skl I'd need to spend some time to fully understand what's going on in the query and how the mixin works (I'm simply installing it with kube-prometheus-stack), so I'd really appreciate if you could do it. One uneducated guess: wouldn't it make sense to just divide the result by the number of instances? In my own queries I'm using |
Sure no problem, I'll get a PR up shortly 👍 |
Done in #1011 @aleskiontherun 😄 |
What happened?
prometheus-operator/kube-prometheus#2558 pls look this
Please provide any helpful snippets.
No response
What parts of the codebase are affected?
Rules
I agree to the following terms:
The text was updated successfully, but these errors were encountered: