Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure exporter: Enable features from upstream exporter #5707

Merged
merged 8 commits into from
Dec 21, 2023

Conversation

kgeckhart
Copy link
Contributor

PR Description

This PR adds two features from the upstream exporter which should help reduce the complexity when pull metrics from azure,

  1. A code path to call a different azure API to gather metrics at the subscription level which can drastically reduce the likelihood of rate limiting
  2. Disabling dimension validation reducing the number of exporter instances needed

I did as much as I could to add further context around these in the code/docs.

Which issue(s) this PR fixes

Fixes: #5411

Azure API rate limiting has come up internally but I couldn't find a corresponding issue to close. I can make one if needed

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@kgeckhart kgeckhart requested review from a team and clayton-cornell as code owners November 3, 2023 15:21
Comment on lines +141 to +145
"ApiName",
"TransactionType",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, locally this is all spaced properly and renders as I would expect. I'm really confused about why it's off when pushed.

Comment on lines 93 to 97
`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? This used to default to true so any existing configs won't break but the default behavior will change. I could flip the default to true to match the existing behavior or add it in the breaking changes section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering the feature, I think it's better to leave the false default and do a breaking change. wdyt @tpaschalis @mattdurham ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, it would only mean existing configs could potentially start accepting more dimensions although they would be manually specified anyway

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the scope of the change between the two? Feels like we are getting more metrics, are we talking 2x,10x,100x? Consider me very ignorant of the internal workings of azure :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some additional context, if it makes sense to do a breaking change then I am onboard with it but it would need to be documented.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdurham I don't know much either about azure internals but I'm thinking the "break" is that wrong configs would start working. Say if someone has something like this and the agent is running, it would start working catching the default dimensions for each metric.

prometheus.exporter.azure "example" {
	subscriptions = ["<...>"]
	resource_type = "Microsoft.Network/applicationGateways"

	included_dimensions = []  
	metrics             = [
		"BackendResponseStatus",
		"Throughput",
	]
	timespan = "PT1M"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more of a behavioral change vs a breaking change. It won't change any currently working configuration someone will be running. They will still get the same amount of data.

On main a config which requests this data from azure fails but runs without issue on this branch,

            resource_type: ["microsoft.containerservice/managedclusters"]
            metrics:
              - node_cpu_usage_millicores
              - node_cpu_usage_percentage
              - node_disk_usage_bytes
              - node_disk_usage_percentage
              - node_memory_rss_bytes
              - node_memory_rss_percentage
              - node_memory_working_set_bytes
              - node_memory_working_set_percentage
              - node_network_in_bytes
              - node_network_out_bytes
            included_resource_tags:
              - environment
            included_dimensions:
              - node
              - nodepool
              - device

The device dimension is only valid for some of the requested metrics, https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-containerservice-managedclusters-metrics. The only valid way to do this on main was to split the config in to multiple definitions (do a self scrape in static mode or multiple exporter definitions in flow mode).

Without validation though there's no safe guards for mistyping/misconfiguration so accidentally doing something like devie would not give you an error where it would previously.

Copy link
Collaborator

@mattdurham mattdurham Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable then and pretty fine with this then, maybe add the equivalent of this blurb to breaking changes just to cover our bases.

@mattdurham mattdurham self-assigned this Nov 14, 2023
1. Set the regions to gather metrics from and get metrics for all resources across those regions
1. This will make an API call per subscription reducing the number of API calls dramatically
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the detail. What is the recourse if a user hits this error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great question, I think the only option is to fall back to the resource graph API based approach which I can add in this list to be clear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome I think that would help!

Comment on lines 19 to 26
The exporter offers two options for gathering metrics,
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics
1. This will make 1 API call per resource identified
1. Subscriptions with a reasonable amount of resources are liable to hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) azure enforces
1. Set the regions to gather metrics from and get metrics for all resources across those regions
1. This will make an API call per subscription reducing the number of API calls dramatically
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The exporter offers two options for gathering metrics,
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics
1. This will make 1 API call per resource identified
1. Subscriptions with a reasonable amount of resources are liable to hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) azure enforces
1. Set the regions to gather metrics from and get metrics for all resources across those regions
1. This will make an API call per subscription reducing the number of API calls dramatically
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics`
The exporter offers the following two options for gathering metrics.
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics.
1. This query will make one API call per resource identified.
1. Subscriptions with a reasonable amount of resources can hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) Azure enforces.
1. Set the regions to gather metrics from and get metrics for all resources across those regions.
1. This option will make one API call per subscription, dramatically reducing the number of API calls.
1. This approach does not work with all resource types, and Azure does not document which resource types do or do not work.
1. A resource type that is not supported produces errors that look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics`.

## Authentication

Grafana agent must be running in an environment with access to Azure. The exporter uses the Azure SDK for go and supports [authentication](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash#2-authenticate-with-azure).

The account used by Grafana Agent needs:

- [Read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph)
- When using an Azure Resoure Graph query, [read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- When using an Azure Resoure Graph query, [read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph)
- [Read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph) if you are using an Azure Resource Graph query.

| `metric_help_template` | `string` | Description of the metric. | `"Azure metric {metric} for {type} with aggregation {aggregation} as {unit}"` | no |
| Name | Type | Description | Default | Required |
|-------------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|----------|
| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes |
| `subscriptions` | `list(string)` | List of subscriptions to scrape metrics from. | | yes |

| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes |
| `resource_type` | `string` | The Azure Resource Type to scrape metrics for. | | yes |
| `metrics` | `list(string)` | The metrics to scrape from resources. | | yes |
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no |
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Can't be used if `regions` is set. | | no |

| `resource_type` | `string` | The Azure Resource Type to scrape metrics for. | | yes |
| `metrics` | `list(string)` | The metrics to scrape from resources. | | yes |
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no |
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Cannot be used if `resource_graph_query_filter` is set. | | no |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Cannot be used if `resource_graph_query_filter` is set. | | no |
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Can't be used if `resource_graph_query_filter` is set. | | no |


# Optional: The list of regions for gathering metrics. Enables gather metrics for all resources in the subscription.
# The list of available `regions` to your subscription can be found by running the azure CLI command `az account list-locations --query '[].name'`.
# Cannot be used if `resource_graph_query_filter` is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Cannot be used if `resource_graph_query_filter` is set.
# Can't be used if `resource_graph_query_filter` is set.

@@ -93,7 +100,14 @@ The account used by Grafana Agent needs:

# Optional: The [kusto query](https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/) filter to apply when searching for resources
# This value will be embedded in to a template query of the form `Resources | where type =~ "<resource_type>" <resource_graph_query_filter> | project id, tags`
# Cannot be used if `regions` is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Cannot be used if `regions` is set.
# Can't be used if `regions` is set.

@@ -137,6 +151,10 @@ The account used by Grafana Agent needs:

# Optional: Which azure cloud environment to connect to, azurecloud, azurechinacloud, azuregovernmentcloud, or azurepprivatecloud
[azure_cloud_environment: <string> | default = "azurecloud"]

# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions.
# Optional: Validation is disabled by default to reduce the number of Azure exporter instances required when a `resource_type` has metrics with varying dimensions.

@@ -77,6 +90,8 @@ Tags in `included_resource_tags` will be added as labels with the name `tag_<tag

Valid values for `azure_cloud_environment` are `azurecloud`, `azurechinacloud`, `azuregovernmentcloud` and `azurepprivatecloud`.

`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain.
`validate_dimensions` is disabled by default to reduce the number of Azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need one exporter instance per metric + dimension combination which is more tedious to maintain.

@@ -137,6 +151,10 @@ The account used by Grafana Agent needs:

# Optional: Which azure cloud environment to connect to, azurecloud, azurechinacloud, azuregovernmentcloud, or azurepprivatecloud
[azure_cloud_environment: <string> | default = "azurecloud"]

# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions.
# Choosing to enable `validate_dimensions` will require 1 exporter instance per metric + dimension combination which can be very tedious to maintain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Choosing to enable `validate_dimensions` will require 1 exporter instance per metric + dimension combination which can be very tedious to maintain.
# Choosing to enable `validate_dimensions` will require one exporter instance per metric + dimension combination which can be very tedious to maintain.

@kgeckhart
Copy link
Contributor Author

@mattdurham and @clayton-cornell I think I covered all the PR feedback if you wanted to take another pass

Copy link
Contributor

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@github-actions github-actions bot added the needs-attention An issue or PR has been sitting around and needs attention. label Dec 18, 2023
@clayton-cornell
Copy link
Contributor

@mattdurham This one fell off my ToDo list. There's nothing more I have to add for docs input. Over to you for a final look/approval?

@github-actions github-actions bot removed the needs-attention An issue or PR has been sitting around and needs attention. label Dec 20, 2023
@kgeckhart
Copy link
Contributor Author

I'll happily resolve conflicts after getting a 👍 on the changes. The go.mod + CHANGELOG.md are going to keep drifting really quickly.

@mattdurham mattdurham enabled auto-merge (squash) December 21, 2023 16:25
@kgeckhart kgeckhart force-pushed the kgeckhart/upgrade-azure-exporter branch from 33f19e1 to 2a22f19 Compare December 21, 2023 16:53
@mattdurham mattdurham merged commit 5e9df54 into main Dec 21, 2023
10 checks passed
@mattdurham mattdurham deleted the kgeckhart/upgrade-azure-exporter branch December 21, 2023 17:46
@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Jan 8, 2024
BarunKGP pushed a commit to BarunKGP/grafana-agent that referenced this pull request Feb 20, 2024
* Update azure exporter for ValidateDimension support

* Add support for subscription scoped metric gathering

* Add docs for validate_dimensions

* Fix conditional for when to use subscription scope

* changelog entry

* Fix crow error from prom common upgrade

* Code review feedback

* Cleanup merge + missing review comments + fix deprecations from go.mod update
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

azure_exporter doesn't support multiple dimensions
6 participants