-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Performance of MakeLabelPairs #1734
base: main
Are you sure you want to change the base?
Improve Performance of MakeLabelPairs #1734
Conversation
Signed-off-by: Kyle Eckhart <[email protected]>
829ec00
to
d049c34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is brilliant, thank you! Elegant and clean.
I proposed using slice vs map for ordering to extra efficiency. We could also keep sorting on MakeLabelPairs if that is too expensive on NewDesc or memory. Speaking of NewDesc efficiency...
The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.
Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric
with dynamic (😱) NewDesc
because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc
efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.
To sum up, LGTM with nits, slice suggestion, new benchmark and request to redo benchmarks for both MakeLabelPairs and NewDesc. Should be quick to do -- thank you so much for helping! 💪🏽
Great point, funny enough yet-another-cloudwatch-exporter was one such use case and refactoring to a reusable Desc approach led me here next. Thanks for all the feedback I'll clean it up and add a bench for Desc as well. |
…nd add a benchmark for NewDesc Signed-off-by: Kyle Eckhart <[email protected]>
4dc5c04
to
6da4db6
Compare
After doing some more benchmarking I think I'll try to find some further optimizations in
The results from
CPU and allocs/op increase if you only use the desc to create a single metric but beyond that you get savings. B/op only starts to show savings when you get to 5 metrics created 😬 . I'll take a deeper look tomorrow but my first inclination is that it's from this block https://github.com/kgeckhart/client_golang/blob/6da4db6dc2f8348a455b3bca5f657b47a3c02b4c/prometheus/desc.go#L174-L178 because I can't re-use the pre-created |
Thanks! Naive math on how much NewDesc is worse vs MakeLabelPairs is better, easily pays of on CPU, but it is interesting on the alloc side and this is what we see on your constMetricFlow bench.
So indeed I would focus on allocs if NewDesc... |
} | ||
} | ||
|
||
func BenchmarkConstMetricFlow(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important for exporters, but if you go this route I would also add benchmark to show the benefit of "normal" use (e.g. NewCounter and WithLabelValues, which is 80% of use). This will perhaps motivate us to accept certain penalty for this niche cases.
For exporters we really need to redesign something around constant metrics. They are too expensive to be created on every scrape, especially if users (majority) expects predictable memory and CPU from k8s containers. I tried in the past to build some dto based cache layer google/cadvisor#2974 but it needs more work.
What I'm saying is that we could make this a bit slower and work with exporters on a long term plan. We could also give option to users (faster NewDesc vs slower one and more work later). Let's check what we could squeeze more.
Would some smart "pre alloc pool" help? (for predictive memory I believe yes)
OK I spend some time to see if trying to get rid of |
} { | ||
b.Run(fmt.Sprintf("labels=%v", len(bm.labelValues)), func(b *testing.B) { | ||
for _, metricsToCreate := range []int{1, 2, 3, 5} { | ||
b.Run(fmt.Sprintf("metrics=%v", metricsToCreate), func(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to add b.ResetTimer()
in all our new benchmarks.
This PR refactors the internals of the
Desc
to create an optimized version ofMakeLabelPairs
by replacing theconstLabels
with a sortedlabelPairs
containing constant labels and variable labels. The Desc also carries an extra map so it is possible to insert the labelValues in to the proper sorted index on output. Benchmark results,No longer sorting the immutable set of label names drops CPU usage dramatically and some memory. There's also a further benefit to memory by calling
proto.String
only once for the variable label names.The tradeoff is a little bit more overhead in the
Desc
to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a singleDesc
instance is likely to be used for numerous calls toMakeLabelPairs
.Related to: #1702