[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

cnasikas · 2024-05-17T15:32:12Z

Summary

This PR aims to decouple the feature IDs from the consumer attribute of rules and alerts.

Architecture

Alerting RBAC model

The Kibana security uses Elasticsearch's application privileges. This way Kibana can represent and store its privilege models within Elasticsearch roles. To do that, Kibana security creates actions that are granted by a specific privilege. Alerting uses its own RBAC model and is built on top of the existing Kibana security model. The Alerting RBAC uses the rule_type_id and consumer attribute to define who owns the rule and the alerts procured by the rule. To connect the rule_type_id and consumer with the Kibana security actions the Alerting RBAC registers its custom actions. They are constructed as alerting:<rule-type-id>/<feature-id>/<alerting-entity>/<operation>. For example alerting:siem.esqlRule/siem/rule/get. This action means that a user with a role that grants this action can get a rule of type siem.esqlRule with consumer siem.

Problem statement

At the moment the consumer attribute should be a valid feature ID. Though this approach worked well so far it has its limitation. Specifically:

Rule types cannot support more than one consumer.
To associate old rules with a new feature ID required a migration on the rule's SOs and the alerts documents.
The API calls are feature ID oriented and not rule type oriented.
The framework has to be aware of the values of the consumer attribute.
Feature IDs are tightly coupled with the alerting indices leading to bugs.
Legacy consumers that are not a valid feature anymore can cause bugs.
The framework has to be aware of legacy consumers to handle edge cases.

Proposed solution

This PR aims to decouple the feature IDs from consumers. It achieves that a) by changing the way solutions configure the alerting privileges when registering a feature and b) by changing the alerting actions. The schema changes as:

// Old formatting
id: 'siem', <--- feature ID
alerting:['siem.queryRule']

// New formatting
id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: 'siem.queryRule', consumers: ['siem'] }] <-- consumer same as the feature ID in the old formatting

The new actions are constructed as alerting:<rule-type-id>/<consumer>/<alerting-entity>/<operation>. For example alerting:rule-type-id/my-consumer/rule/get. The new action means that a user with a role that grants this action can get a rule of type rule-type with consumer my-consumer. Changing the action strings is not considered a breaking change as long as the user's permission works as before. In our case, this is true because the consumer will be the same as before (feature ID), and the alerting security actions will be the same. For example:

Old formatting

Schema:

id: 'logs', <--- feature ID
alerting:['.es-query'] <-- rule type ID

Action:

alerting:.es-query/logs/rule/get

New formatting

Schema:

id: 'siem', <--- feature ID
alerting: [{ ruleTypeId: '.es-query', consumers: ['logs'] }] <-- consumer same as the feature ID in the old formatting

Action:

alerting:.es-query/logs/rule/get <--- consumer is set as logs same as before

In both formating the actions are the same thus breaking changes are avoided.

Alerting authorization class

The alerting plugin uses and exports the alerting authorization class (AlertingAuthorization). The class is responsible to handle all authorization actions related to rules and alerts. The class changed to handle the new actions as described in the above sections. A lot of methods were renamed, removed, cleaned up, or have their argument or return types changed. These changed affected various piece of the code. The changes in this class are the most important in this PR especially the _getAuthorizedRuleTypesWithAuthorizedConsumers method which is the cornerstone of the alerting RBAC. Please review carefully.

Instatiation of the alerting authorization class

The AlertingAuthorizationClientFactory is used to create instances of the AlertingAuthorization class. The AlertingAuthorization class needs to perform async operations upon instatiation. Because JS, at the moment, does not support async instantiation of classes the AlertingAuthorization class was assiging Promise objects to variables that could be resolved later in other phases of the lifecycle of the class. To improve redability and make it clearer the lifecycle of the class I seperated the construction of the class (initialization) from the bootstrap process. As a result getting the AlertingAuthorization class or any client that depends on it (RulesClient for example) is an async operation.

Filtering

A lot of routes are using the authorization class to get the authorization filter (getFindAuthorizationFilter), a filter that, if applied, returns only the rule types and consumers the user is authorized to. The method that returns the filter was build in a way to also support filtering on top of the authorization filter thus coupling the authorized filter with route filtering. I believe these two operation should be decoupled and the filter method should return a filter that gives you all the authorized rule types. It is the responsibility of the consumer, route in our case, to apply extra filters on top of the authorization filter. For that reason, I did all necessary changes to decoupe them.

Legacy consumers & producer

A lot of rules and alerts have been created and are still being created from observability with the alerts consumer. When the Alerting RBAC encounters a rule or alert with alerts as a consumer it falls back to the producer of the rule type ID to construct the actions. For example if a rule with ruleTypeId: .es-query and consumer: alerts the alerting action will be constructed as alerting:.es-query/stackRules/rule/get where stackRules is the producer of the .es-query rule type. The producer is used to be used in alerting authorization but due to its complexity, it was deprecated and only used as a fallback for the alerts consumer. To avoid breaking changes all rule types will set the alerts consumer as valid consumers when configuring the alerting privileges. By moving the alerts consumer to the registration of the feature we can stop relying on the producer. In the next PRs the producer will removed entirely.

Routes

The following changes were introduced to the alerting routes:

All related routes changed to be rule-type oriented and not feature ID oriented.
All related routes support the ruleTypeIds and the consumers parameters for filtering.
The /internal/rac/alerts/_feature_ids route got deleted as it was not used anywhere in the codebase and it was internal.

All the changes in the routes are related to internal routes and no breaking change is introduced.

Notable code changes

Change all instances of feature IDs to rule type IDs.
isSiemRuleType: Temporary helper function. The plan is to be removed entirely in further iterations.
Move o11y and stack rule type IDs to kbn-rule-data-utils.
Export all security solution rule type IDs from kbn-securitysolution-rules.
Rename alerting PluginSetupContract and PluginStartContract to AlertingServerSetup and AlertingServerStart.
Change the way the AlertingAuthorization class is instantiated.
getRulesClient converted to an async function.
Rename AlertingAuthorization methods and make its methods to take only an object as argument.
Change the response signature of some methods of the AlertingAuthorization class.
filter_consumers was mistakenly exposed to a public API. It was undocumented.
The getFindAuthorizationFilter authorization helper function does not accept filters. It should return a filter for all authorized rule types regardless of the request. Filtering by ruleTypeIds moved to calls to ES or the SO client.
Files or functions that were not used anywhere in the codebase got deleted.
Change the returned type of the list method of the RuleTypeRegistry from Set<RegistryRuleType> to Map<string, RegistryRuleType>.
Assertion of KueryNode in tests changed to assetion of KQL using toKqlExpression.

Testing

Caution

It is very important to test all the areas of the application where rules or alerst are being used directly or indirectly. Scenarios to consider:

The correct rules, alerts, and aggregations on top of them are being show as expected as a superuser.
The correct rules, alerts, and aggregations on top of them are being show as expected as a user with limitted access to certain features.
The changes in this PR are backwards compatible with users' roles.

Solutions

Please test all the rule types you own with all possible combinations of permissions.

ResponseOps

Please test all stack rules with all possible combinations of permissions.

Risk Matrix

FQA

I noticed that a lot of routes supports the filter paramater where we can pass an arbitrary KQL filter. Why we do not use this to filter by the rule type IDs and the consumers and instead we introduce new dedicated paramaters?
Why we need to filter by consumer? Should not the ruleTypeIds be enough?
I noticed in the code a lot of instances where the consumer is used. Should not remove any logic around consumers?

…ruleTypeIds

cnasikas · 2024-05-20T13:08:01Z

/ci

cnasikas · 2024-10-18T14:43:00Z

/ci

cnasikas · 2024-10-18T16:42:07Z

/ci

cnasikas · 2024-10-18T19:27:15Z

x-pack/plugins/alerting/common/routes/rule/apis/find/schemas/v1.ts

+  rule_type_ids: schema.maybe(schema.arrayOf(schema.string())),
+  consumers: schema.maybe(schema.arrayOf(schema.string())),


Same schema as the public plus the rule_type_ids and the consumers.

cnasikas · 2024-10-19T09:02:26Z

/ci

cnasikas · 2024-10-19T09:07:32Z

packages/kbn-alerts-ui-shared/src/alert_filter_controls/alert_filter_controls.tsx

@@ -82,7 +81,7 @@ export type AlertFilterControlsProps = Omit<
 */
 export const AlertFilterControls = (props: AlertFilterControlsProps) => {
  const {
-    featureIds = [AlertConsumers.STACK_ALERTS],
+    ruleTypeIds = [],


Should we default it to stack rules?

cnasikas · 2024-10-19T09:49:49Z

packages/kbn-rule-data-utils/src/alerts_as_data_rbac.ts

+  ALERTS: 'alerts',
+  DISCOVER: 'discover',


Now that the consumers are decoupled from the feature IDs the list should include all possible consumers so far. alerts and discover are valid ones. Ideally, we should not have a list of possible consumers. I hope in the subsequent PRs we will remove it.

cnasikas · 2024-10-19T09:51:16Z

packages/kbn-rule-data-utils/src/alerts_as_data_rbac.ts

+/**
+ * TODO: Abstract it and remove it
+ */
+export const isSiemRuleType = (ruleTypeId: string) => ruleTypeId.startsWith('siem.');


In the codebase, we have a lot of checks (hacks) related to security rule types. To reduce the scope of the PR as much as possible I choose to try to fix it slowly on subsequent PRs. At the moment is needed.

cnasikas · 2024-10-19T10:03:40Z

packages/kbn-rule-data-utils/src/rule_types/o11y_rules.ts

I am not a fan of having a centralized place for the rule type IDs. Ideally, consumers of the framework should specify keywords like observablility (category or subcategory) or even apm.* and the framework should know which rule type IDs to pick up. But again, I think it is out of the scope of the PR, and at the moment it seems the most straightforward way to move forward.

cnasikas · 2024-10-19T15:39:19Z

x-pack/plugins/alerting/server/rules_client/common/filters.ts

Inspired by the cases utils for filtering.

cnasikas · 2024-10-19T16:08:29Z

x-pack/plugins/monitoring/server/lib/cluster/get_clusters_from_request.ts

I refactored a bit the code to accommodate the async nature of getRulesClient. It should work as before.

cnasikas · 2024-10-19T16:34:11Z

x-pack/plugins/observability_solution/investigate_app/server/services/get_alerts_client.ts

-    'apm',
-    'slo',
-    'uptime',
-    'observability',


Does the observability match to more rule type IDs I have put? Should I add all o11y rule type IDs?

cnasikas · 2024-10-19T16:51:49Z

x-pack/plugins/observability_solution/observability_ai_assistant_app/server/functions/alerts.ts

If you create a rule from the main o11y page the consumer is set to alerts to most of the rules. For that reason, I added the AlertsConsumer.ALERTS to fetch alerts with kibana.alert.rule.consumer: alerts. Also, I am not sure if we want to include stack rules which I included because some of the stack rules can be created with the logs or infrastructure consumer.

cnasikas · 2024-10-19T17:11:05Z

/ci

cnasikas · 2024-10-20T00:24:42Z

/ci

elasticmachine · 2024-10-20T01:27:25Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #21 / ObservabilityApp Observability Rules page Create rules flyout Should allow the user to select consumers when creating ES query rules
[job] [logs] FTR Configs #21 / ObservabilityApp Observability Rules page Create rules flyout Should allow the user to select consumers when creating ES query rules
[job] [logs] Jest Tests #3 / owner utils getOwnerFromRuleConsumerProducer returns owner { id: 'cases', validRuleConsumers: [Array] } correctly for consumer
[job] [logs] Jest Tests #3 / owner utils getOwnerFromRuleConsumerProducer returns owner { id: 'cases', validRuleConsumers: [Array] } correctly for producer
[job] [logs] FTR Configs #19 / Serverless observability API - feature flags Platform security APIs security/authorization available features composite features
[job] [logs] FTR Configs #19 / Serverless observability API - feature flags Platform security APIs security/authorization available features composite features

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id	before	after	diff
`alerting`	239	238	-1
`cases`	812	816	+4
`observability`	1063	1062	-1
`securitySolution`	6037	6035	-2
`triggersActionsUi`	850	848	-2
total			-2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`@kbn/alerts-ui-shared`	304	294	-10
`@kbn/rule-data-utils`	133	140	+7
`@kbn/securitysolution-rules`	25	26	+1
`alerting`	848	839	-9
`ml`	63	64	+1
`ruleRegistry`	248	244	-4
total			-14

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`alerting`	93.8KB	91.2KB	-2.6KB
`apm`	3.4MB	3.4MB	+73.0B
`cases`	492.0KB	492.1KB	+81.0B
`discover`	824.9KB	824.9KB	+42.0B
`infra`	1.7MB	1.7MB	+245.0B
`ml`	4.5MB	4.5MB	-193.0B
`observability`	470.6KB	469.7KB	-853.0B
`observabilityLogsExplorer`	147.3KB	147.6KB	+358.0B
`securitySolution`	20.7MB	20.7MB	+161.0B
`slo`	855.3KB	855.6KB	+273.0B
`synthetics`	1.1MB	1.1MB	+420.0B
`triggersActionsUi`	1.7MB	1.7MB	+52.0B
total			-2.0KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`apm`	38.0KB	38.2KB	+209.0B
`cases`	151.2KB	151.3KB	+132.0B
`infra`	54.1KB	55.0KB	+929.0B
`ml`	75.3KB	76.0KB	+673.0B
`observability`	103.8KB	104.6KB	+766.0B
`observabilityShared`	75.0KB	75.1KB	+36.0B
`securitySolution`	87.5KB	87.5KB	-1.0B
`slo`	24.8KB	25.1KB	+394.0B
`synthetics`	37.6KB	37.9KB	+325.0B
`triggersActionsUi`	127.4KB	127.0KB	-363.0B
total			+3.0KB

Unknown metric groups

API count

id	before	after	diff
`@kbn/alerts-grouping`	31	32	+1
`@kbn/alerts-ui-shared`	320	310	-10
`@kbn/rule-data-utils`	136	152	+16
`@kbn/securitysolution-rules`	28	29	+1
`alerting`	880	872	-8
`ml`	148	149	+1
`ruleRegistry`	285	281	-4
total			-3

ESLint disabled line counts

id	before	after	diff
`@kbn/test-suites-xpack`	730	731	+1

Total ESLint disabled count

id	before	after	diff
`@kbn/test-suites-xpack`	755	756	+1

History

cc @cnasikas

cnasikas · 2024-10-20T08:02:57Z

x-pack/plugins/rule_registry/server/search_strategy/search_strategy.ts

@@ -191,7 +223,15 @@ export const ruleRegistrySearchStrategyProvider = (
            );
          }

-          throw err;
+          if (Boom.isBoom(err)) {


Search strategy was always thrown a 500 error without respecting the error codes, like 403, throwing by the alerts authorization. I tried to fix that.

cnasikas · 2024-10-20T08:28:18Z

x-pack/plugins/security_solution/public/detection_engine/rule_management/api/api.test.ts

-      );
-    });
-
-    test('requests the same number of rules as the number of ids provided', () => {


I combined the three tests into one.

cnasikas · 2024-10-20T08:53:38Z

...triggers_actions_ui/public/application/sections/alerts_page/components/stack_alerts_page.tsx

I split up the PageContent into PageContentWrapper and PageContent so the PageContent renders after the loading of the available rule types.

cnasikas · 2024-10-20T09:50:16Z

.../plugins/triggers_actions_ui/public/application/sections/alerts_table/alerts_table_state.tsx

@@ -579,7 +583,7 @@ const AlertsTableStateWithQueryProvider = memo(

    return (
      <AlertsTableContext.Provider value={alertsTableContext}>
-        {!isLoading && alertsCount === 0 && (
+        {!isLoading && alertsCount <= 0 && (


If alertsCount is undefined is initialized above as -1.

github-actions · 2024-10-20T18:26:38Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Change the schema of the alerting feature privilege

ed3015e

cnasikas added the release_note:skip Skip the PR/issue when compiling release notes label May 17, 2024

cnasikas self-assigned this May 17, 2024

cnasikas added 5 commits May 18, 2024 19:20

Merge branch 'main' into poc_decouple_consumers_feature_ids

82a1a65

Change augmentRuleTypesWithAuthorization to use consumers instead of …

c8ec120

…ruleTypeIds

Support legacy consumers

1a02609

Fixes in rule filtering

55f91fd

Merge branch 'main' into poc_decouple_consumers_feature_ids

b3cea58

cnasikas added 2 commits May 23, 2024 13:37

Merge branch 'main' into poc_decouple_consumers_feature_ids

5184d21

Change new schema

3b7c89f

cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from 06d8499 to 3b7c89f Compare May 23, 2024 15:47

cnasikas changed the title ~~[POC] Decouple feature IDs from consumres~~ [POC] Decouple feature IDs from consumers May 28, 2024

cnasikas added 2 commits May 30, 2024 16:59

Merge branch 'main' into poc_decouple_consumers_feature_ids

b1a0a63

Filter out rule types with no registered consumers

8b5fd17

cnasikas mentioned this pull request Jun 5, 2024

[ResponseOps][Alerting] Decouple rule producer/consumer settings from Kibana feature ID #181559

Open

cnasikas mentioned this pull request Jun 29, 2024

[ResponseOps][Meta] Alerting RBAC enhancements #187202

Open

cnasikas added 7 commits July 5, 2024 20:47

Merge branch 'main' into poc_decouple_consumers_feature_ids

ac46730

Merge branch 'main' into poc_decouple_consumers_feature_ids

c88e147

Refactor the way the AlertingAuthorization object is created

b69bff1

Add the alerts consumers to all rule types

e7363b1

Fix async type errors

e1b9845

Add test for alerting authorization object creation

41e5491

Add tests and support filtering

a2b73f9

cnasikas force-pushed the poc_decouple_consumers_feature_ids branch from f65ca2c to a2b73f9 Compare July 14, 2024 15:08

cnasikas added 5 commits July 16, 2024 13:21

Add more tests

0bb255b

Merge branch 'main' into poc_decouple_consumers_feature_ids

4c0c828

Finalize alerting auth unit tests and functionality

736a63e

Fix types

0a6261e

Fix tests

c4094d0

cnasikas added 2 commits October 18, 2024 15:24

Merge branch 'main' into poc_decouple_consumers_feature_ids

1f1ce53

Merge branch 'main' into poc_decouple_consumers_feature_ids

9618c26

Fix i18n

16cdf79

cnasikas commented Oct 18, 2024

View reviewed changes

Merge branch 'main' into poc_decouple_consumers_feature_ids

0ad199b

cnasikas commented Oct 19, 2024

View reviewed changes

x-pack/plugins/alerting/server/rules_client/common/filters.ts Outdated

Copy link

Member Author

cnasikas Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired by the cases utils for filtering.

cnasikas commented Oct 19, 2024

View reviewed changes

Nits, fixes, and tests

b0892c6

[CI] Auto-commit changed files from 'node scripts/notice'

ba9bba3

cnasikas commented Oct 20, 2024

View reviewed changes

cnasikas added ci:cloud-deploy Create or update a Cloud deployment ci:project-deploy-elasticsearch Create an Elasticsearch Serverless project ci:project-deploy-observability Create an Observability project ci:project-deploy-security Create a Security Serverless Project labels Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

cnasikas commented May 17, 2024 •

edited

Loading

cnasikas commented May 20, 2024

cnasikas commented Oct 18, 2024

cnasikas commented Oct 18, 2024

cnasikas Oct 18, 2024

cnasikas commented Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas Oct 19, 2024

cnasikas commented Oct 19, 2024

cnasikas commented Oct 20, 2024

elasticmachine commented Oct 20, 2024 •

edited

Loading

API count

ESLint disabled line counts

Total ESLint disabled count

cnasikas Oct 20, 2024

cnasikas Oct 20, 2024

cnasikas Oct 20, 2024

cnasikas Oct 20, 2024

github-actions bot commented Oct 20, 2024

		rule_type_ids: schema.maybe(schema.arrayOf(schema.string())),
		consumers: schema.maybe(schema.arrayOf(schema.string())),

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

Are you sure you want to change the base?

[ResponseOps][Alerting] Decouple feature IDs from consumers #183756

Conversation

cnasikas commented May 17, 2024 • edited Loading

Summary

Architecture

Alerting RBAC model

Problem statement

Proposed solution

Alerting authorization class

Instatiation of the alerting authorization class

Filtering

Legacy consumers & producer

Routes

Notable code changes

Testing

Solutions

ResponseOps

Risk Matrix

FQA

cnasikas commented May 20, 2024

cnasikas commented Oct 18, 2024

cnasikas commented Oct 18, 2024

Choose a reason for hiding this comment

cnasikas commented Oct 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cnasikas commented Oct 19, 2024

cnasikas commented Oct 20, 2024

elasticmachine commented Oct 20, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

Module Count

Public APIs missing comments

Async chunks

Page load bundle

API count

ESLint disabled line counts

Total ESLint disabled count

History

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2024

🤖 GitHub comments

cnasikas commented May 17, 2024 •

edited

Loading

elasticmachine commented Oct 20, 2024 •

edited

Loading