-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work around high cardinality metrics from otelcol receivers #4769
Work around high cardinality metrics from otelcol receivers #4769
Conversation
I suspect the Test failure was a flake? It succeeded on the commit before the changelog update |
Likely, the OSX and Win tests are unreliable at times. I restarted the tests. |
Thank you for your contribution! I agree that we should do something about this problem. I'd have preferred if the Collector code avoids high cardinality metrics by default, but they have to be explicitly disabled via a feature gate. In any case, I believe we should be able to toggle this on and off in the Grafana Agent. Would you be willing to try to implement a boolean flag called By the way, we will also need to document those metrics. I raised a separate issue for this - #4769 - and we could work on this in a separate PR. |
@ptodev Ok! Yeah I think I can take a stab at implementing that 👍 |
@ptodev I've been working on a PoC but ran into an issue. Now I have a |
There is no need for the |
Even so if we just pass down the options the problem will be the same, the components that use those options when building their MeterProviders will need to be |
I'm 95% sure that it won't be a problem - flow is designed in such a way that if a config update is required, the
|
That's what I've been trying but invariably I have to force a rebuild of the other block by changing some variable. Right now I'm testing with this config: logging {
level = "debug"
format = "logfmt"
}
metrics {
otelcol_disable_high_cardinality_metrics = false
}
otelcol.receiver.zipkin "default" {
parse_string_tags = false
output {
traces = []
}
} If I change the I think if I refactored to pass down the option and still have the individual components own their own registry it would still be "working" after the |
@ptodev The suggestion to have a new top-level block with a setting that only impacts a subset of components is weird to me; that's typically not how we've written things for Flow in the past. What do you think about adding this attribute to a block in the component instead? This would allow per-component tuning of metrics, and would be more Flow native in that things would be logically separated. |
@rfratto I'm used to thinking of these feature gates as global settings, since in theory any Otel component could use them. However, in this case we are not setting an Otel feature gate in the Agent, so I agree that we could have per-component overrides. The main issue with per-component overrides is that we are not sure which components exactly will be impacted, and how to document them. What do you think about this course of action:
The downside of the above is that even if a component doesn't end up producing high cardinality metrics which match this filter, it'll still have that setting. But I think that's ok. |
I'm fine with that proposal, but I would suggest not calling the block Something like |
@rfratto - thank you, @glindstedt - would you like to try implementing the proposal from this comment please? Thank you for working with us, and for your patience! This change would be very useful. If you are too busy, don't worry, we could also pick it up at some point. |
Thank you both for sorting out a path forward. @ptodev yes I think I'll have some time to implement this in the coming week, but if other priorities come in between I'll let you know! |
I've pushed a commit that adds the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update!
It'd be nice to add a few tests at least to some components to make sure that the field is set as expected.
We would also need a documentation, similar to how the output block is currently documented:
- There is a shared md file for it.
- It is listed in the docs for each component two times:
In the shared md, we could also mention that this component may not end up using those views if it doesn't generate such metrics. For example, a shared md could look like this:
---
aliases:
- ../../otelcol/debug-metrics-block/
canonical: https://grafana.com/docs/agent/latest/shared/flow/reference/components/debug-metrics-block/
headless: true
---
The `debug_metrics` block configures the metrics which this component generates to monitor its state.
The following arguments are supported:
Name | Type | Description | Default | Required
---- | ---- | ----------- | ------- | --------
`disable_high_cardinality_metrics` | `bool` | Whether to disable certain high cardinality metrics. | `false` | no
`disable_high_cardinality_metrics` is the Grafana Agent equivalent to the `telemetry.useOtelForInternalMetrics` feature gate in the OpenTelemetry Collector. It removes attributes which could case high cardinality metrics.
For example, attributes with IP addresses and port numbers in metrics about HTTP and gRPC connections will be removed.
This is correct - those are Agent-specific components which are not in the Collector, and they are not in scope for this change. |
Co-authored-by: Paulin Todev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some spelling fixes. Switch which
for that
.
docs/sources/flow/reference/components/otelcol.exporter.jaeger.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.exporter.loadbalancing.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.exporter.logging.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.exporter.otlp.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.exporter.otlphttp.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.receiver.opencensus.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.receiver.otlp.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/otelcol.receiver.zipkin.md
Outdated
Show resolved
Hide resolved
docs/sources/shared/flow/reference/components/otelcol-debug-metrics-block.md
Outdated
Show resolved
Hide resolved
docs/sources/shared/flow/reference/components/otelcol-debug-metrics-block.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you very much for working on this! I just left one minor comment that needs to be addressed prior to merging.
Also, please feel free to update the Changelog with an entry such as this under the "Enhancements" section:
- Added a `disable_high_cardinality_metrics` configuration flag to `otelcol`
exporters and receivers to switch high cardinality debug metrics off. (@glindstedt)
That way the change will be visible in the release notes for the next version, and we can pay tribute to you as well :)
docs/sources/shared/flow/reference/components/otelcol-debug-metrics-block.md
Outdated
Show resolved
Hide resolved
|
||
// DebugMetricsArguments configures internal metrics of the components | ||
type DebugMetricsArguments struct { | ||
DisableHighCardinalityMetrics bool `river:"disable_high_cardinality_metrics,attr,optional"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not sure if we should just call this high_cardinality_metrics
. IMO we can merge this as it is for now, and I'll have a discussion with the other people on the team if they'd be in favour of changing it prior to the next release.
Co-authored-by: Clayton Cornell <[email protected]>
…lancing.md Co-authored-by: Clayton Cornell <[email protected]>
…g.md Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: Clayton Cornell <[email protected]>
…tp.md Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: Clayton Cornell <[email protected]>
…nsus.md Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: Clayton Cornell <[email protected]>
…trics-block.md Co-authored-by: Clayton Cornell <[email protected]>
…trics-block.md Co-authored-by: Clayton Cornell <[email protected]>
…trics-block.md Co-authored-by: Paulin Todev <[email protected]>
Thank you for the reviews, suggestions commited and changelog updated 👍 |
PR Description
Heavily inspired by open-telemetry/opentelemetry-collector#7543 which was mentioned in the upstream issue. This PR adds views to the
MeterProvider
of the receiver component which filters out some high cardinality attributes.Which issue(s) this PR fixes
Fixes #4764
Notes to the Reviewer
PR Checklist