Skip to content

Commit

Permalink
mixin: Ignore cache delete errors for cache error alerts (#10287)
Browse files Browse the repository at this point in the history
Delete operations are expected to fail when the key doesn't exist when
keys are deleted as part of cache invalidation.

Signed-off-by: Nick Pillitteri <[email protected]>
  • Loading branch information
56quarters authored Dec 19, 2024
1 parent dc1410c commit 2ecc15d
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 10 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
### Mixin

* [BUGFIX] Dashboards: fix how we switch between classic and native histograms. #10018
* [BUGFIX] Alerts: Ignore cache errors performing `delete` operations since these are expected to fail when keys don't exist. #10287

### Jsonnet

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,11 @@ spec:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
rate(thanos_cache_operation_failures_total{operation!~"add|delete"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operations_total{operation!="add"}[1m])
rate(thanos_cache_operations_total{operation!~"add|delete"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,11 @@ groups:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
rate(thanos_cache_operation_failures_total{operation!~"add|delete"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operations_total{operation!="add"}[1m])
rate(thanos_cache_operations_total{operation!~"add|delete"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
4 changes: 2 additions & 2 deletions operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,11 @@ groups:
expr: |
(
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operation_failures_total{operation!="add"}[1m])
rate(thanos_cache_operation_failures_total{operation!~"add|delete"}[1m])
)
/
sum by(cluster, namespace, name, operation) (
rate(thanos_cache_operations_total{operation!="add"}[1m])
rate(thanos_cache_operations_total{operation!~"add|delete"}[1m])
)
) * 100 > 5
for: 5m
Expand Down
9 changes: 5 additions & 4 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -202,16 +202,17 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
{
alert: $.alertName('CacheRequestErrors'),
// Specifically exclude "add" operations which are used for cache invalidation and "locking" since
// they are expected to sometimes fail in normal operation (such as when a "lock" already exists).
// Specifically exclude "add" and "delete" operations which are used for cache invalidation and "locking"
// since they are expected to sometimes fail in normal operation (such as when a "lock" already exists or
// key being invalidated does not exist).
expr: |||
(
sum by(%(group_by)s, name, operation) (
rate(thanos_cache_operation_failures_total{operation!="add"}[%(range_interval)s])
rate(thanos_cache_operation_failures_total{operation!~"add|delete"}[%(range_interval)s])
)
/
sum by(%(group_by)s, name, operation) (
rate(thanos_cache_operations_total{operation!="add"}[%(range_interval)s])
rate(thanos_cache_operations_total{operation!~"add|delete"}[%(range_interval)s])
)
) * 100 > 5
||| % {
Expand Down

0 comments on commit 2ecc15d

Please sign in to comment.