diff --git a/mission-control/docs/notifications/concepts/silence-notification-form.png b/mission-control/docs/notifications/concepts/silence-notification-form.png new file mode 100644 index 0000000..6d15c9f Binary files /dev/null and b/mission-control/docs/notifications/concepts/silence-notification-form.png differ diff --git a/mission-control/docs/notifications/concepts/silences.mdx b/mission-control/docs/notifications/concepts/silences.mdx new file mode 100644 index 0000000..39452c0 --- /dev/null +++ b/mission-control/docs/notifications/concepts/silences.mdx @@ -0,0 +1,28 @@ +--- +title: Silences +--- + +Silences prevent notifications from being sent. +Silences are attached to a particular resource (`catalog`, `health check`, or `component`) and are active for a specified duration. + +:::note +Notifications that aren't sent due to silence are still visible in the notification history for auditing purposes. +::: + +## Use cases + +- Planned maintenance or deployments. Eg: You can silence a namespace or a helm release and automatically silence notifications from all of their children. +- Non-critical resources: Notifications from resources that routinely trigger alerts but are expected and harmless can be silenced. +- Known issues: If there's a known issue that can't be immediately resolved (e.g., due to dependencies or resource constraints), you might silence related alerts until a fix can be implemented. + +## Add Silence + +Silences can be added from the notification page. Alternatively, if you're using the default slack notification templates, you get a silence button +on each notification. + +![Silence Notification form](./silence-notification-form.png) + +## Recursive mode + +When a silence is recursively applied, it applies to all of its children. +So silencing a namespace would silence all deployments, statefulsets, pods, etc in that namespace. \ No newline at end of file diff --git a/mission-control/docs/notifications/concepts/wait-for.mdx b/mission-control/docs/notifications/concepts/wait-for.mdx new file mode 100644 index 0000000..cc595c3 --- /dev/null +++ b/mission-control/docs/notifications/concepts/wait-for.mdx @@ -0,0 +1,32 @@ +--- +title: Wait For +--- + +Kubernetes clusters and similar dynamic systems may experience temporary discrepancies between the actual and intended state of resources. +For example, a deployment could momentarily appear unhealthy during a scaling operation. +If alerts are configured for `config.unhealthy` events, these transient state fluctuations might lead to an overwhelming number of unnecessary notifications. + +To address this issue, you can utilize the waitFor parameter. +This feature allows you to define a delay before sending notifications for specific events. +After an event occurs, the system will recheck its status following the specified wait period. Only if the undesired state persists will a notification be triggered. + +:::info +`waitFor` is only applicable on health related events +::: + +This approach helps reduce unnecessary notifications caused by transient state changes, ensuring you're alerted only to persistent issues. + + +```yaml title='notify-unhealthy-deployments.yaml' {8} +apiVersion: mission-control.flanksource.com/v1 +kind: Notification +metadata: + name: deployment-unhealthy-alerts +spec: + events: + - config.unhealthy + waitFor: 2m + filter: config.type == 'Kubernetes::Deployment' + to: + email: alerts@acme.com +``` \ No newline at end of file diff --git a/mission-control/docs/reference/notifications/_notification.mdx b/mission-control/docs/reference/notifications/_notification.mdx index 2783847..c407969 100644 --- a/mission-control/docs/reference/notifications/_notification.mdx +++ b/mission-control/docs/reference/notifications/_notification.mdx @@ -45,6 +45,11 @@ description: "RepeatGroup allows notifications to be grouped by certain set of keys and only send one per group within the specified repeat interval. Valid group keys: `resource_id` & `source_event`.", scheme: "`[]string`" }, + { + field: "waitFor", + description: "The duration to delay sending a health-based notification.\n\nAfter this period, the health status is reassessed to confirm it hasn't changed, helping prevent false alarms from transient issues.", + scheme: "duration" + }, { field: "title", description: "Channel dependent e.g. subject for email",