feat: notification grouping & Inhibition (#381)

adityathebe · web-flow · commit 9e9075f01728 · 2025-05-15T09:51:19.000+03:00
* docs: notification inhibition

* docs: notification grouping

* chore: typo fixes

* chore: vale error fixes
diff --git a/mission-control-chart b/mission-control-chart
@@ -1 +1 @@
-Subproject commit ba28f078b57102bf6259041e041ed4fce36060eb
+Subproject commit 5f2fde6cebc65f39d701ad4311d30c7c0d660b9a
diff --git a/mission-control/docs/guide/notifications/concepts/grouping.md b/mission-control/docs/guide/notifications/concepts/grouping.md
@@ -0,0 +1,24 @@
+---
+title: Grouping
+sidebar_custom_props:
+  icon: group
+---
+
+Mission Control may generate multiple related notifications within a short time window. Instead of sending each alert,
+you can use notification grouping to merge multiple events into a single message.
+
+_Example_: When multiple Helm releases fail to upgrade because of a common unavailable dependency,
+you can use notification grouping to merge the notifications for all the affected helm releases into a single message.
+
+The `groupBy` parameter lets you define how to group notifications.
+You can group by:
+
+- `type` (type of the config)
+- `description`
+- `status_reason`
+- `label` in the format `label:app`
+- `tag` in the format `tag:namespace`
+
+```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
+
+```
diff --git a/mission-control/docs/guide/notifications/concepts/inhibition.mdx b/mission-control/docs/guide/notifications/concepts/inhibition.mdx
@@ -0,0 +1,20 @@
+---
+title: Inhibition
+sidebar_custom_props:
+  icon: block
+---
+
+import Inhibition from '../../../reference/notifications/_inhibition.mdx';
+
+Multiple related notifications may be generated within a short time window. Instead of sending each alert separately,
+you can use notification inhibition to inhibit notifications based on the resource hierarchy.
+
+_Example_: When a Kubernetes pod becomes unhealthy, its replicaset and the deployment will also become unhealthy.
+If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications for the same cause.
+
+```yaml title="deployment-with-inhibition.yaml" file=<rootDir>/modules/mission-control/fixtures/notifications/deployment-with-inhibition.yaml
+```
+
+<Inhibition />
+
+
diff --git a/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx b/mission-control/docs/guide/notifications/concepts/repeat-interval.mdx
@@ -4,7 +4,7 @@ sidebar_custom_props:
   icon: dedupe
 ---
 
-The repeat interval determines the duration between subsequent notifications after an initial successful delivery.
+The repeat interval determines the duration between subsequent related notifications after an initial successful delivery. 
 
 ```yaml title="deployment-failed.yaml"
 apiVersion: mission-control.flanksource.com/v1
@@ -14,24 +14,33 @@ metadata:
   namespace: default
 spec:
   events:
-    - config.healthy
     - config.unhealthy
-    - config.warning
-    - config.unknown
   filter: config.type == "Kubernetes::Deployment"
   to:
     email: alerts@acme.com
   repeatInterval: 2h
+  groupBy:
+    - type
+  groupByInterval: 12h
 ```
 
-## Grouping Per Resource per Source Event
+With the above notification in place, if a Kubernetes Deployment's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system only sends one notification in that period.
 
-The `repeatInterval` applies per unique resource per unique event source. This prevents duplicate notifications when the same resource triggers the same event type multiple times within the interval window. It still allows notifications for different event types on the same resource within the window.
+### Repeat groups
 
-### Example:
+Repeat interval works in tandem with [notification grouping](./grouping.md).
+If multiple notifications fall in the same group, only one notification will be sent for the group within the repeat interval.
 
-With the above notification in place, if a Kubernetes Helm release's health fluctuates between `healthy` and `unhealthy` multiple times within a 2-hour window, the system limits notifications to just two: one for the `config.healthy` event and one for the `config.unhealthy` event.
+#### Example:
 
-However, if the Helm release health shifts to `warning` during this same period, it triggers an additional notification. This occurs because the warning status is considered a separate event source.
+Deployment A becomes unhealthy due to a missing storage class, triggering a notification.
+Soon after, Deployment B also turns unhealthy for the same reason. Since it’s grouped with A, no additional notification is sent during the repeat interval.
+After the 2-hour interval passes, if Deployment C also becomes unhealthy for the same issue, a new notification is sent for C but A & B will also be included in the notification.
 
-The notification throttling mechanism operates independently for each distinct resource. As a result, other Helm releases are not affected by this limitation.
+
+
+| Time   | Deployment | Status     | Action Taken                                       |
+|--------|------------|------------|----------------------------------------------------|
+| 10:00  | A          | Unhealthy  | Notification sent _(first in group)_                 |
+| 10:15  | B          | Unhealthy  | Supressed due to repeat interval _(grouped with A for 12h)_ |
+| 12:10  | C          | Unhealthy  | Notification sent _(repeat interval expired)_ Includes A, B, and C in the message _(group is still active since groupByInterval is 12h)_ |
diff --git a/mission-control/docs/guide/notifications/concepts/wait-for.mdx b/mission-control/docs/guide/notifications/concepts/wait-for.mdx
@@ -51,30 +51,4 @@ spec:
   waitFor: 5m
   //highlight-next-line
   waitForEvalPeriod: 30s
-```
-:::
-
-### Grouping Notifications
-
-Multiple related notifications may be generated within a short time window. Instead of sending each alert separately, 
-you can use notification grouping to consolidate multiple events into a single message.
-
-_Example_: When a Kubernetes deployment becomes unhealthy, its replicaset and associated pods will also become unhealthy. 
-If you have a notification set up to alert on `config.unhealthy`, you'll receive 3 different notifications at the very least for the same cause.
-
-The `groupBy` parameter allows you to define how notifications should be grouped. 
-Grouping can be done via 
-- `type` (type of the config)
-- `description`
-- `status_reason`
-- `labels` in the format `labels:app`
-- `tags` in the format `tag:namespace`
-
-:::info
-Grouping only works with waitFor. 
-Hence, a waitFor duration is required 
-:::
-
-
-```yaml title="" file=<rootDir>/modules/mission-control/fixtures/notifications/config-health.yaml {11-12}
 ```
diff --git a/mission-control/docs/reference/notifications/_inhibition.mdx b/mission-control/docs/reference/notifications/_inhibition.mdx
@@ -0,0 +1,33 @@
+<Fields 
+  rows={[
+    {
+      field: 'depth',
+      scheme: 'int',
+      description: 'Defines how many levels of child or parent resources to traverse.'
+    },
+    {
+      field: 'direction',
+      scheme: '`inoming`|`outgoing`|`both`',
+      required: true,
+      description: 'Specifies the traversal direction in relation to the "From" resource. Can be "outgoing" (looks for child resources), "incoming" (looks for parent resources), or "all" (considers both).'
+    },
+    {
+      field: 'from',
+      scheme: '`string`',
+      required: true,
+      description: 'Specifies the starting resource type (for example, "Kubernetes::Deployment").'
+    },
+    {
+      field: 'soft',
+      scheme: 'bool',
+      description: 'When true, relates using soft relationships. Example: Deployment to Pod is hard relationship, but Node to Pod is soft relationship.'
+    },
+    {
+      field: 'to',
+      scheme: '`[]string`',
+      required: true,
+      description: 'Specifies the traversal direction in relation to the `from` resource. `outgoing` looks for child resources and `incoming` looks for parent resources.'
+    }
+  ]}
+/>
+
diff --git a/mission-control/docs/reference/notifications/_notification.mdx b/mission-control/docs/reference/notifications/_notification.mdx
@@ -1,3 +1,5 @@
+import Inhibition from '../../reference/notifications/_inhibition.mdx';
+
 <Fields withTemplates="true" rows={[
   {
     field: "events",
@@ -60,6 +62,11 @@
     description: "Group notifications that are in waiting stage based on labels, tags and attributes. Only applicable when `waitFor` is provided. See [Grouping attributes](../../guide/notifications/concepts/wait-for#grouping-notifications)",
     scheme: "[]string"
   },
+  {
+    field: "groupByInterval",
+    description: "The maximum duration for which notifications will be grouped together before creating a new group. _(Default: 24h)_",
+    scheme: "duration"
+  },
   {
     field: "title",
     description: "Channel dependent e.g. subject for email",
@@ -95,8 +102,17 @@
     description: "specify the `<namespace>/<name>` of the playbook that should be triggered when this notification fires.",
     scheme: "string",
   },
+  {
+    field: "inhibitions",
+    description: "Inhibitions are used to inhibit notifications based on the resource hierarchy.",
+    scheme: "[`[]Inhibition`](#inhibition)",
+  }
 ]}/>
 
 :::info Single Recipient
 Only one recipient can be specified
 :::
+
+### Inhibition
+
+<Inhibition />
diff --git a/modules/canary-checker b/modules/canary-checker
@@ -1 +1 @@
-Subproject commit ab9f263f1a357fda68b32a1ecceba465fa11f037
+Subproject commit 8a07be11ce6cac1bba80983cd0d0fd886bf94015
diff --git a/modules/config-db b/modules/config-db
@@ -1 +1 @@
-Subproject commit 13663fa292091000c5b088a25322af9d7aa965f5
+Subproject commit e420722ac1eb0e5bea7d60e4ac535ef71bd85ae0
diff --git a/modules/duty b/modules/duty
@@ -1 +1 @@
-Subproject commit d8af8506a672319e8c3013a1fee6ce6604091b27
+Subproject commit 7456d2b41d75356da25ea4819281420efd7fe071
diff --git a/modules/mission-control b/modules/mission-control
@@ -1 +1 @@
-Subproject commit c4e984932a2d95ece00f79066881ce20ce942b79
+Subproject commit 1e72c906c4d8ebd998f737bb61370e95b90c54f5
diff --git a/modules/mission-control-chart b/modules/mission-control-chart
@@ -1 +1 @@
-Subproject commit 100b5816751795a138dc19a43e0e4a8986370c19
+Subproject commit 5f2fde6cebc65f39d701ad4311d30c7c0d660b9a
diff --git a/modules/mission-control-registry b/modules/mission-control-registry
@@ -1 +1 @@
-Subproject commit 81297714d0ee27fc6be69085c41f7694417caa08
+Subproject commit 5d6449397af9a8fa246b600b684f36ec01b91c32