Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: refactor alerts to accomodate for single-node clusters #1010

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 47 additions & 16 deletions alerts/resource_alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,34 @@ local utils = import '../lib/utils.libsonnet';
} +
if $._config.showMultiCluster then {
expr: |||
sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) - (sum(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s) - max(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s)) > 0
(sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) -
sum(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s) > 0
and
(sum(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s) - max(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s)) > 0
count by (cluster) (max by (cluster, node) (kube_node_role{role="control-plane"})) < 3)
or
(sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) -
(sum(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s) -
max(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s)) > 0
and
(sum(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s) -
max(kube_node_status_allocatable{%(kubeStateMetricsSelector)s,resource="cpu"}) by (%(clusterLabel)s)) > 0)
||| % $._config,
annotations+: {
description: 'Cluster {{ $labels.%(clusterLabel)s }} has overcommitted CPU resource requests for Pods by {{ $value }} CPU shares and cannot tolerate node failure.' % $._config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel that reusing the same alert for single node clusters creates confusion. For instance the alert description doesn't fit right in this case:

  • By definition, a single node cluster can't tolerate any node failure.
  • I should be able to use all the allocatable CPU without interfering with the system components (etcd, kube-api, kubelet, ...) so I'm not sure why we multiply by 0.85. E.g. the reserved CPU value is tuned to the right usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about reserving the final 15% as a leeway for backgrounded processes that may belong to userland but not necessarily be initiated by the user, and fall in a "grey" space.

But I agree, this introduces an opinionated constant threshold that doesn't make sense if we look at the metrics and alert definitions. Like you said, "allocatable" should mean that till 100%, and "overcommitment" should quite literally mean exceeding that limit. I'll drop the threshold.

Talking with Balut, I'm inclined to believe that the users will expect the alert to adapt to SNO, as in, if the requests simply exceed the allocatable resources. It may additionally be ambiguous to introduce new alerts for SNO as a 1+1 multi-node system may stop firing AlertX and start firing AlertYSNO when it's reduced to SNO, since the latter is arguably a derivative of the former to some degree (as some users may expect).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we could of course explore the possibility of adding SNO-exclusive alerts, but I just wanted to put these points out there, I'm on the fence mostly)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add on, I think we also need to revise how we handle SNO downstream, as in, at the moment we recognize SNO by a SingleReplica topology infrastructure, however, SNO can technically have more than one node (SNO+1 configurations), which would be in-line with having a different set of alerts for SNO while ensuring that upstream expects the same. I'll see if I can find any SNO-dedicated teams or people who can additionally shed some light here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rexagod how would you like to proceed with this PR? I've held back from merging as it seems there's open discussion still.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay here, we are talking this through internally and I'll update as soon as there's a resolution.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the tests have moved into tests/ directory since:

},
} else {
expr: |||
sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) - (sum(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s}) - max(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s})) > 0
(sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) -
sum(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s}) > 0
and
count(max by (node) (kube_node_role{role="control-plane"})) < 3)
or
(sum(namespace_cpu:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) -
(sum(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s}) -
max(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s})) > 0
and
(sum(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s}) - max(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s})) > 0
(sum(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s}) -
max(kube_node_status_allocatable{resource="cpu", %(kubeStateMetricsSelector)s})) > 0)
||| % $._config,
annotations+: {
description: 'Cluster has overcommitted CPU resource requests for Pods by {{ $value }} CPU shares and cannot tolerate node failure.' % $._config,
Expand All @@ -65,24 +81,39 @@ local utils = import '../lib/utils.libsonnet';
} +
if $._config.showMultiCluster then {
expr: |||
sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) - (sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s) - max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s)) > 0
(sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) -
sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s) > 0
and
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s) - max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s)) > 0
count by (cluster) (max by (cluster, node) (kube_node_role{role="control-plane"})) < 3)
or
(sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) by (%(clusterLabel)s) -
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s) -
max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s)) > 0
and
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s) -
max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) by (%(clusterLabel)s)) > 0)
||| % $._config,
annotations+: {
description: 'Cluster {{ $labels.%(clusterLabel)s }} has overcommitted memory resource requests for Pods by {{ $value | humanize }} bytes and cannot tolerate node failure.' % $._config,
},
} else
{
expr: |||
sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) - (sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) - max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s})) > 0
and
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) - max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s})) > 0
||| % $._config,
annotations+: {
description: 'Cluster has overcommitted memory resource requests for Pods by {{ $value | humanize }} bytes and cannot tolerate node failure.',
},
} else {
expr: |||
(sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) -
sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) > 0
and
count(max by (node) (kube_node_role{role="control-plane"})) < 3)
or
(sum(namespace_memory:kube_pod_container_resource_requests:sum{%(ignoringOverprovisionedWorkloadSelector)s}) -
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) -
max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s})) > 0
and
(sum(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s}) -
max(kube_node_status_allocatable{resource="memory", %(kubeStateMetricsSelector)s})) > 0)
||| % $._config,
annotations+: {
description: 'Cluster has overcommitted memory resource requests for Pods by {{ $value | humanize }} bytes and cannot tolerate node failure.',
},
},
{
alert: 'KubeCPUQuotaOvercommit',
labels: {
Expand Down
104 changes: 104 additions & 0 deletions tests/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1323,3 +1323,107 @@ tests:
description: 'Cluster has overcommitted memory resource requests for Namespaces.'
runbook_url: "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryquotaovercommit"
summary: "Cluster has overcommitted memory resource requests."

- name: KubeCPUOvercommit alert (single-node)
- interval: 1m
input_series:
- series: 'namespace_cpu:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="default"}'
values: '1x10'
- series: 'namespace_cpu:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="kube-system"}'
values: '1x10'
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n1", resource="cpu", job="kube-state-metrics"}'
values: '1.9x10' # This value was seen on a 2x vCPU node
- series: 'kube_node_info{cluster="kubernetes", node="n1", job="kube-state-metrics"}'
values: '1x10'
alert_rule_test:
- eval_time: 9m
alertname: KubeCPUOvercommit
- eval_time: 10m
alertname: KubeCPUOvercommit
exp_alerts:
- exp_labels:
severity: warning
exp_annotations:
description: Cluster has overcommitted CPU resource requests for Pods by 0.385 CPU shares and cannot tolerate node failure.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit
summary: Cluster has overcommitted CPU resource requests.

- name: KubeCPUOvercommit alert (multi-node)
- interval: 1m
input_series:
- series: 'namespace_cpu:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="default"}'
values: '2x10'
- series: 'namespace_cpu:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="kube-system"}'
values: '2x10'
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n1", resource="cpu", job="kube-state-metrics"}'
values: '1.9x10' # This value was seen on a 2x vCPU node
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n2", resource="cpu", job="kube-state-metrics"}'
values: '1.9x10'
- series: 'kube_node_info{cluster="kubernetes", node="n1", job="kube-state-metrics"}'
values: '1x10'
- series: 'kube_node_info{cluster="kubernetes", node="n2", job="kube-state-metrics"}'
values: '1x10'
alert_rule_test:
- eval_time: 9m
alertname: KubeCPUOvercommit
- eval_time: 10m
alertname: KubeCPUOvercommit
exp_alerts:
- exp_labels:
severity: warning
exp_annotations:
description: Cluster has overcommitted CPU resource requests for Pods by 2.1 CPU shares and cannot tolerate node failure.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubecpuovercommit
summary: Cluster has overcommitted CPU resource requests.

- name: KubeMemoryOvercommit alert (single-node)
- interval: 1m
input_series:
- series: 'namespace_memory:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="default"}'
values: '1000000000x10' # 1 GB
- series: 'namespace_memory:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="kube-system"}'
values: '1000000000x10'
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n1", resource="memory", job="kube-state-metrics"}'
values: '1000000000x10'
- series: 'kube_node_info{cluster="kubernetes", node="n1", job="kube-state-metrics"}'
values: '1x10'
alert_rule_test:
- eval_time: 9m
alertname: KubeMemoryOvercommit
- eval_time: 10m
alertname: KubeMemoryOvercommit
exp_alerts:
- exp_labels:
severity: warning
exp_annotations:
description: Cluster has overcommitted memory resource requests for Pods by 1.15G bytes and cannot tolerate node failure.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryovercommit
summary: Cluster has overcommitted memory resource requests.

- name: KubeMemoryOvercommit alert (multi-node)
- interval: 1m
input_series:
- series: 'namespace_memory:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="default"}'
values: '2000000000x10' # 2 GB
- series: 'namespace_memory:kube_pod_container_resource_requests:sum{cluster="kubernetes", namespace="kube-system"}'
values: '2000000000x10'
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n1", resource="memory", job="kube-state-metrics"}'
values: '1000000000x10'
- series: 'kube_node_status_allocatable{cluster="kubernetes", node="n2", resource="memory", job="kube-state-metrics"}'
values: '1000000000x10'
- series: 'kube_node_info{cluster="kubernetes", node="n1", job="kube-state-metrics"}'
values: '1x10'
- series: 'kube_node_info{cluster="kubernetes", node="n2", job="kube-state-metrics"}'
values: '1x10'
alert_rule_test:
- eval_time: 9m
alertname: KubeMemoryOvercommit
- eval_time: 10m
alertname: KubeMemoryOvercommit
exp_alerts:
- exp_labels:
severity: warning
exp_annotations:
description: Cluster has overcommitted memory resource requests for Pods by 3G bytes and cannot tolerate node failure.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubememoryovercommit
summary: Cluster has overcommitted memory resource requests.
Loading