Skip to content

Commit 9e9075f

Browse files
authored
feat: notification grouping & Inhibition (#381)
* docs: notification inhibition * docs: notification grouping * chore: typo fixes * chore: vale error fixes
1 parent c7cea7d commit 9e9075f

File tree

13 files changed

+119
-43
lines changed

13 files changed

+119
-43
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Grouping
3+
sidebar_custom_props:
4+
icon: group
5+
---
6+
7+
Mission Control may generate multiple related notifications within a short time window. Instead of sending each alert,
8+
you can use notification grouping to merge multiple events into a single message.
9+
10+
_Example_: When multiple Helm releases fail to upgrade because of a common unavailable dependency,
11+
you can use notification grouping to merge the notifications for all the affected helm releases into a single message.
12+
13+
The `groupBy` parameter lets you define how to group notifications.
14+
You can group by:
15+
16+
- `type` (type of the config)
17+
- `description`
18+
- `status_reason`
19+
- `label` in the format `label:app`
20+
- `tag` in the format `tag:namespace`
21+
22+
```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
23+
24+
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: Inhibition
3+
sidebar_custom_props:
4+
icon: block
5+
---
6+
7+
import Inhibition from '../../../reference/notifications/_inhibition.mdx';
8+
9+
Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
10+
you can use notification inhibition to inhibit notifications based on the resource hierarchy.
11+
12+
_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy.
13+
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause.
14+
15+
```yaml title="deployment-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml
16+
```
17+
18+
<Inhibition />
19+
20+

mission-control/docs/guide/notifications/concepts/repeat-interval.mdx

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ sidebar_custom_props:
44
icon: dedupe
55
---
66

7-
The repeat interval determines the duration between subsequent notifications after an initial successful delivery.
7+
The repeat interval determines the duration between subsequent related notifications after an initial successful delivery.
88

99
```yaml title="deployment-failed.yaml"
1010
apiVersion: mission-control.flanksource.com/v1
@@ -14,24 +14,33 @@ metadata:
1414
namespace: default
1515
spec:
1616
events:
17-
- config.healthy
1817
- config.unhealthy
19-
- config.warning
20-
- config.unknown
2118
filter: config.type == "Kubernetes::Deployment"
2219
to:
2320
2421
repeatInterval: 2h
22+
groupBy:
23+
- type
24+
groupByInterval: 12h
2525
```
2626
27-
## Grouping Per Resource per Source Event
27+
With the above notification in place, if a Kubernetes Deployment's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system only sends one notification in that period.
2828

29-
The `repeatInterval` applies per unique resource per unique event source. This prevents duplicate notifications when the same resource triggers the same event type multiple times within the interval window. It still allows notifications for different event types on the same resource within the window.
29+
### Repeat groups
3030

31-
### Example:
31+
Repeat interval works in tandem with [notification grouping](./grouping.md).
32+
If multiple notifications fall in the same group, only one notification will be sent for the group within the repeat interval.
3233

33-
With the above notification in place, if a Kubernetes Helm release's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system limits notifications to just two: one for the `config.healthy` event and one for the `config.unhealthy` event.
34+
#### Example:
3435

35-
However, if the Helm release health shifts to `warning` during this same period, it triggers an additional notification. This occurs because the warning status is considered a separate event source.
36+
Deployment A becomes unhealthy due to a missing storage class, triggering a notification.
37+
Soon after, Deployment B also turns unhealthy for the same reason. Since it’s grouped with A, no additional notification is sent during the repeat interval.
38+
After the 2-hour interval passes, if Deployment C also becomes unhealthy for the same issue, a new notification is sent for C but A & B will also be included in the notification.
3639

37-
The notification throttling mechanism operates independently for each distinct resource. As a result, other Helm releases are not affected by this limitation.
40+
41+
42+
| Time | Deployment | Status | Action Taken |
43+
|--------|------------|------------|----------------------------------------------------|
44+
| 10:00 | A | Unhealthy | Notification sent _(first in group)_ |
45+
| 10:15 | B | Unhealthy | Supressed due to repeat interval _(grouped with A for 12h)_ |
46+
| 12:10 | C | Unhealthy | Notification sent _(repeat interval expired)_ Includes A, B, and C in the message _(group is still active since groupByInterval is 12h)_ |

mission-control/docs/guide/notifications/concepts/wait-for.mdx

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -51,30 +51,4 @@ spec:
5151
waitFor: 5m
5252
//highlight-next-line
5353
waitForEvalPeriod: 30s
54-
```
55-
:::
56-
57-
### Grouping Notifications
58-
59-
Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
60-
you can use notification grouping to consolidate multiple events into a single message.
61-
62-
_Example_: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy.
63-
If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications at the very least for the same cause.
64-
65-
The `groupBy` parameter allows you to define how notifications should be grouped.
66-
Grouping can be done via
67-
- `type` (type of the config)
68-
- `description`
69-
- `status_reason`
70-
- `labels` in the format `labels:app`
71-
- `tags` in the format `tag:namespace`
72-
73-
:::info
74-
Grouping only works with waitFor.
75-
Hence, a waitFor duration is required
76-
:::
77-
78-
79-
```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
8054
```
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<Fields
2+
rows={[
3+
{
4+
field: 'depth',
5+
scheme: 'int',
6+
description: 'Defines how many levels of child or parent resources to traverse.'
7+
},
8+
{
9+
field: 'direction',
10+
scheme: '`inoming`|`outgoing`|`both`',
11+
required: true,
12+
description: 'Specifies the traversal direction in relation to the "From" resource. Can be "outgoing" (looks for child resources), "incoming" (looks for parent resources), or "all" (considers both).'
13+
},
14+
{
15+
field: 'from',
16+
scheme: '`string`',
17+
required: true,
18+
description: 'Specifies the starting resource type (for example, "Kubernetes::Deployment").'
19+
},
20+
{
21+
field: 'soft',
22+
scheme: 'bool',
23+
description: 'When true, relates using soft relationships. Example: Deployment to Pod is hard relationship, but Node to Pod is soft relationship.'
24+
},
25+
{
26+
field: 'to',
27+
scheme: '`[]string`',
28+
required: true,
29+
description: 'Specifies the traversal direction in relation to the `from` resource. `outgoing` looks for child resources and `incoming` looks for parent resources.'
30+
}
31+
]}
32+
/>
33+

mission-control/docs/reference/notifications/_notification.mdx

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import Inhibition from '../../reference/notifications/_inhibition.mdx';
2+
13
<Fields withTemplates="true" rows={[
24
{
35
field: "events",
@@ -60,6 +62,11 @@
6062
description: "Group notifications that are in waiting stage based on labels, tags and attributes. Only applicable when `waitFor` is provided. See [Grouping attributes](../../guide/notifications/concepts/wait-for#grouping-notifications)",
6163
scheme: "[]string"
6264
},
65+
{
66+
field: "groupByInterval",
67+
description: "The maximum duration for which notifications will be grouped together before creating a new group. _(Default: 24h)_",
68+
scheme: "duration"
69+
},
6370
{
6471
field: "title",
6572
description: "Channel dependent e.g. subject for email",
@@ -95,8 +102,17 @@
95102
description: "specify the `<namespace>/<name>` of the playbook that should be triggered when this notification fires.",
96103
scheme: "string",
97104
},
105+
{
106+
field: "inhibitions",
107+
description: "Inhibitions are used to inhibit notifications based on the resource hierarchy.",
108+
scheme: "[`[]Inhibition`](#inhibition)",
109+
}
98110
]}/>
99111

100112
:::info Single Recipient
101113
Only one recipient can be specified
102114
:::
115+
116+
### Inhibition
117+
118+
<Inhibition />

modules/duty

Submodule duty updated 67 files

0 commit comments

Comments
 (0)