You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `cancel_if_*`: Labels used for alert inhibitions
174
-
- `all_pipelines: "true"`: Ensures the alert is sent to Opsgenie regardless of installation's pipeline
179
+
- `all_pipelines: "true"`: Ensures the alert is sent to PagerDuty regardless of installation's pipeline
175
180
176
181
#### `Absent` function
177
182
178
-
If you want to make sure a metrics exists on one cluster, you can't just use the `absent` function anymore.
183
+
If you want to make sure a metric exists on one cluster, you can't just use the `absent` function anymore.
179
184
With `mimir` we have metrics for all the clusters on a single database, and it makes detecting the absence of one metrics on one cluster much harder.
180
185
181
186
To achieve such a test, you should do like [`MimirToGrafanaCloudExporterMissingData`](https://github.com/giantswarm/prometheus-rules/blob/d06a84e8369f4d0bafdf0d48f18120de15c8e18a/helm/prometheus-rules/templates/platform/atlas/alerting-rules/grafana-cloud.rules.yml#L33) alert does.
@@ -188,20 +193,18 @@ To achieve such a test, you should do like [`MimirToGrafanaCloudExporterMissingD
188
193
189
194
### Alert routing
190
195
191
-
Alertmanager does the routing based on the labels menitoned above.
196
+
Alertmanager does the routing based on the labels mentioned above.
192
197
You can see the routing rules in alertmanager's config (opsctl open `alertmanager`, then go to `Status`), section `route:`.
193
198
194
-
* are sent to opsgenie:
195
-
* all `severity=page` alerts
196
-
* are sent to slack team-specific channels:
197
-
* `severity=page` or `severity=notify`
198
-
* `team` defines which channel to route to.
199
-
200
-
#### Opsgenie routing
201
-
202
-
Opsgenie routing is defined in the `Teams` section of the Opsgenie application.
199
+
**Alerts are routed as follows:**
203
200
204
-
Opsgenie route alerts based on the `team` label.
201
+
* **Sent to PagerDuty:**
202
+
* All `severity=page` alerts
203
+
* **Sent to GitHub Issues (via alertmanager-to-github):**
204
+
* All `severity=ticket` alerts
205
+
* **Sent to Slack team-specific channels:**
206
+
* `severity=page` or `severity=notify` alerts
207
+
* The `team` label defines which channel to route to
205
208
206
209
### Inhibitions
207
210
@@ -219,29 +222,29 @@ Official documentation for inhibit rules can be found here: https://www.promethe
219
222
220
223
The recording rules are located in `helm/prometheus-rules/templates/<area>/<team>/recording-rules` in the specific area/team to which they belong.
221
224
222
-
### Mixins management
225
+
## Mixins management
223
226
224
227
#### kubernetes-mixins
225
228
226
-
To Update `kubernetes-mixins` recording rules:
229
+
To update `kubernetes-mixins` recording rules:
227
230
228
231
* Follow the instructions in [giantswarm-kubernetes-mixin](https://github.com/giantswarm/giantswarm-kubernetes-mixin)
229
-
* Run `./scripts/sync-kube-mixin.sh (?my-fancy-branch-or-tag)` to updated the `helm/prometheus-rules/templates/shared/recording-rules/kubernetes-mixins.rules.yml` folder.
230
-
* make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards/tree/master/helm/dashboards/dashboards/mixin)
232
+
* Run `./scripts/sync-kube-mixin.sh (?my-fancy-branch-or-tag)` to update the `helm/prometheus-rules/templates/shared/recording-rules/kubernetes-mixins.rules.yml` folder
233
+
* Make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards/tree/master/helm/dashboards/dashboards/mixin)
231
234
232
235
#### mimir-mixins
233
236
234
237
To update `mimir-mixins` recording rules:
235
238
236
239
* Run `./mimir/update.sh`
237
-
* make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards)
240
+
* Make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards)
238
241
239
242
#### loki-mixins
240
243
241
244
To update `loki-mixins` recording rules:
242
245
243
246
* Run `./loki/update.sh`
244
-
* make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards)
247
+
* Make sure to update [grafana dashboards](https://github.com/giantswarm/dashboards)
245
248
246
249
#### tempo-mixins
247
250
@@ -253,7 +256,7 @@ To update `tempo-mixins` alerting rules:
253
256
254
257
You can run all tests by running `make test`.
255
258
256
-
There are 4 different types tests implemented:
259
+
There are 4 different types of tests implemented:
257
260
258
261
- [Prometheus rules unit tests](#prometheus-rules-unit-tests)
@@ -318,7 +321,7 @@ This is a good example of an input series for testing a `range` query.
318
321
319
322
#### Test templating
320
323
321
-
In order to reduce the need for provider-specific test files, you can use `$provider` in your test file and our tooling will replace it with the provider name.
324
+
To reduce the need for provider-specific test files, you can use `$provider` in your test file and the tooling will replace it with the provider name.
322
325
323
326
#### Test exceptions
324
327
@@ -372,7 +375,7 @@ make test-rules rules_type=loki
372
375
373
376
#### Test "no data" case
374
377
375
-
* It can be nice to test what happens when serie does not exist.
378
+
* It can be nice to test what happens when a series does not exist.
376
379
* For instance, You can have your first 60 iterations with no data like this: `_x60`
377
380
378
381
#### Useful links
@@ -396,10 +399,10 @@ This is possible thanks to the alertmanager config file stored in the [observabi
396
399
397
400
This is what we call the inhibition dependency chain.
398
401
399
-
One can check whether inhibition labels (mostly "cancel_if_" prefixed ones) are well defined and triggered by a corresponding label in the alerting rules by running the `make test-inhibitions` command at the projet's root directory.
402
+
You can check whether inhibition labels (mostly "cancel_if_" prefixed ones) are well defined and triggered by a corresponding label in the alerting rules by running the `make test-inhibitions` command at the project's root directory.
400
403
401
-
This command will output the list of missing labels. Each of them will need to be defined in either the alerting rules or the alertmanager config file depending on its nature: either an inhibition label or its source label.
402
-
If there is no labels outputed, this means tests passed and did not find missing inhibition labels.
404
+
This command will output the list of missing labels. Each of them will need to be defined in either the alerting rules or the alertmanager config file depending on its nature: either an inhibition label or its source label.
405
+
If no labels are output, this means tests passed and did not find missing inhibition labels.
0 commit comments