What is the cost of calling the metrics endpoint? #1086

rodoufu · 2022-06-23T18:28:57Z

rodoufu
Jun 23, 2022

What is the cost of calling the metrics endpoint?

Does it calculate the value of the metrics when we call the endpoint or when the metrics are updated?

Jul 27, 2022

It heavily depends on the collector implementation. HTTP handler calls Gather() ([]*dto.MetricFamily, error) on your prometheus.Registry, which calls Collect(ch chan<- Metric) on all your registered collectors. You can register many types of collectors, which might have different characteristics:

The values are saved in memory for native metrics (const metrics, gauges, counters summary, histogram). Collect will then mostly just copy the value to new protobuf object. You update those metrics in separate goroutines using Inc(), Add(), observe and so on.
The ...Func family of metrics e.g NewGaugeFunc are invoking providing function to on every call. Users can provide expensive computation …

View full answer

bwplotka · 2022-07-27T13:16:37Z

bwplotka
Jul 27, 2022
Maintainer

It heavily depends on the collector implementation. HTTP handler calls Gather() ([]*dto.MetricFamily, error) on your prometheus.Registry, which calls Collect(ch chan<- Metric) on all your registered collectors. You can register many types of collectors, which might have different characteristics:

The values are saved in memory for native metrics (const metrics, gauges, counters summary, histogram). Collect will then mostly just copy the value to new protobuf object. You update those metrics in separate goroutines using Inc(), Add(), observe and so on.
The ...Func family of metrics e.g NewGaugeFunc are invoking providing function to on every call. Users can provide expensive computation there, which can cause problems. It is not recommended - the recommendation is to use this for constant metrics like info metrics or current configuration options.
The Collector pattern allows whatever logic you want. You can cache the information and allow Collect to be fast, or you can compute all on demand which will introduce some overhead on /metric call. For example process collector gets linux statistics from file on every call. Using MustNewConstMetric inside collector with predefined description outside of Collect is fine, unless you attach many dynamic labels in the process (which has some overhead in MakeLabelPairs. If you go into extreme areas with more expensive computations or thousands of metrics, it might be worth to invest in custom collector implementation like our cache implementation we attempted to put into cadvisor code which was quite slow and expensive.

Hope I did not miss anything (: cc @kakkoyun @beorn7

0 replies

beorn7 · 2022-08-10T12:03:33Z

beorn7
Aug 10, 2022
Maintainer

I think @bwplotka described it quite well.

Just as a different perspective:

What's really optimized in this library are the calls that happen for instrumentation, like e.g. Inc or Observe. They could very well happen thousands of times per second.

Calling the metrics endpoint is expected to happen a few times per minute, in rare cases it might happen as often as about once per second. But even then you can see that we are orders of magnitude away from the instrumentation calls above.

Copying the metrics value out of the instrumentation primitives (e.g. a counter, a histogram, …) is still quite fast compared to the heavy-lifting that a call to the metrics endpoint implies anyway (network access, HTTP handling and such). The most "needless" work is currently done because this library was designed around protobuf, but most scrapers use the text format. Therefore, the code usually takes the detour of rendering everything as protobuf (which is, on top of everything else, quite expensive because it is using the standard protobuf library rather than one of the more efficient alternative implementation) to then convert the protobuf into the text format. That's where some projects with rather extreme use cases (like https://github.com/kubernetes/kube-state-metrics ) have run into a wall and decided to handcode the metrics exposition in a way perfectly tailored to their use case.

Finally, if collecting the metrics themselves is a lot of work, then that's usually the bottleneck, shadowing everything else mentioned so far. @bwplotka has discussed that above already.

As a rule of thumb, if you instrument a binary that has a "main job", and you just want to monitor that main job, you will rarely run into the problem that the instrumentation eats a significant portion of your resources. The more your binary focuses on metrics collection itself (like exporters, or binaries that collect metrics from many other sources, e.g. KSM or cadvisor), the tradeoffs change and you might start to worry.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the cost of calling the metrics endpoint? #1086

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

What is the cost of calling the metrics endpoint? #1086

rodoufu Jun 23, 2022

Replies: 2 comments

bwplotka Jul 27, 2022 Maintainer

beorn7 Aug 10, 2022 Maintainer

rodoufu
Jun 23, 2022

bwplotka
Jul 27, 2022
Maintainer

beorn7
Aug 10, 2022
Maintainer