Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support customizing bucket boundaries for status metrics bundle_loading_duration_ns #6950

Closed
catstail1 opened this issue Aug 21, 2024 · 4 comments

Comments

@catstail1
Copy link

What is the underlying problem you're trying to solve?

We're using status metrics bundle_loading_duration_ns to monitor opa server performance on bundle loading. The metrics are used to debug why it takes several minutes for new version of bundles to be loaded by our opa server. The current metrics indicates at least 50% of bundle_loading_duration_ns metrics fall into bucket +Inf for all 4 stages, and most small duration buckets are not used. Found currently the buckets are set with prometheus.ExponentialBuckets(1000, 2, 20), which is 1 microsecond to 1 second, which is too small for our case, fetching the bundle remotely.
image

We would like to get more buckets with long durations to know more how long it spend on each stages. Thanks!

Describe the ideal solution

it would be great if the configuration support customized bucket for metrics bundle_loading_duration_ns, as the configuraiton of server.metrics.prom.http_request_duration_seconds.buckets, so that we can make most usage of configured buckets.

Describe a "Good Enough" solution

if prometheus.ExponentialBuckets is still preferably used, it would be enough to support the config of the parameters start (probably also factor) in prometheus.ExponentialBuckets

Additional Context

I found a similar resolved issue for http_request_duration_seconds: #3196

@ashutosh-narkar
Copy link
Member

Making this configurable via the server config seems like a good approach. We'll have to push that new config over through the runtime into the code that handles the plugin instantiation. Then use the provided config or fallback to the default bucket values in the status plugin. If you'd like to contribute this feature, please feel free to do so. Happy to help if you have any questions. Thanks.

@jwu730-1
Copy link
Contributor

Making this configurable via the server config seems like a good approach. We'll have to push that new config over through the runtime into the code that handles the plugin instantiation. Then use the provided config or fallback to the default bucket values in the status plugin. If you'd like to contribute this feature, please feel free to do so. Happy to help if you have any questions. Thanks.

After reading more code, it seems better to set the custom buckets under status configuration (currently there's a boolean prometheus). And it makes sense to allow customize the bucket only when status.prometheus is true.
However, during my local test, it seems not easy to fix/update here. The original test tries to checks if the histogram collector for status.bundle_loading_duration_ns is registered as bundleLoadDuration, which is a global variable created by NewHistogramVec. From prometheus code, I didnt find a accessible field can be used to check the buckets. Do you have any suggestion on how to check if registered histogram use the default / custom bucket? Thanks!

Copy link

stale bot commented Sep 29, 2024

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@dbsectrainer
Copy link

The functionality to customize bucket boundaries for bundle_loading_duration_ns is already implemented in OPA. You can configure custom bucket boundaries in your OPA configuration file using the status.prometheus_config.collectors.bundle_loading_duration_ns.buckets field.

Here's how to configure it:

status:
prometheus: true
prometheus_config:
collectors:
bundle_loading_duration_ns:
buckets: [
1e6, # 1ms
5e6, # 5ms
10e6, # 10ms
50e6, # 50ms
100e6, # 100ms
500e6, # 500ms
1e9, # 1s
5e9, # 5s
10e9, # 10s
30e9, # 30s
60e9, # 1min
300e9, # 5min
600e9 # 10min
]
The bucket values are specified in nanoseconds. The example above shows a configuration better suited for monitoring longer bundle loading durations, with buckets ranging from 1ms to 10 minutes.

By default, OPA uses prometheus.ExponentialBuckets(1000, 2, 20) which creates buckets from 1 microsecond to 1 second. With the configuration above, you'll get more granular insights into longer bundle loading operations.

You can adjust these bucket boundaries based on your specific needs. For example, if you're seeing most metrics fall into the +Inf bucket, you might want to add more buckets with higher values.

@stale stale bot removed the inactive label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants