Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Performance of MakeLabelPairs #1734

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kgeckhart
Copy link

@kgeckhart kgeckhart commented Feb 12, 2025

This PR refactors the internals of the Desc to create an optimized version of MakeLabelPairs by replacing the constLabels with a sorted labelPairs containing constant labels and variable labels. The Desc also carries an extra map so it is possible to insert the labelValues in to the proper sorted index on output. Benchmark results,

                             │ baseline.txt │               new.txt                │
                             │    sec/op    │    sec/op     vs base                │
_MakeLabelPairs/1_label-11      94.25n ± 1%   47.06n ±  1%  -50.07% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11     262.9n ± 1%   119.8n ±  0%  -54.44% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11   1005.0n ± 0%   372.9n ± 10%  -62.89% (p=0.000 n=10)
geomean                         292.0n        128.1n        -56.13%

                             │ baseline.txt │              new.txt               │
                             │     B/op     │    B/op     vs base                │
_MakeLabelPairs/1_label-11      136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11      360.0 ± 0%   288.0 ± 0%  -20.00% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11    1048.0 ± 0%   880.0 ± 0%  -16.03% (p=0.000 n=10)
geomean                          371.6        289.8       -22.02%

                             │ baseline.txt │              new.txt               │
                             │  allocs/op   │ allocs/op   vs base                │
_MakeLabelPairs/1_label-11       5.000 ± 0%   3.000 ± 0%  -40.00% (p=0.000 n=10)
_MakeLabelPairs/3_labels-11     11.000 ± 0%   7.000 ± 0%  -36.36% (p=0.000 n=10)
_MakeLabelPairs/10_labels-11     29.00 ± 0%   19.00 ± 0%  -34.48% (p=0.000 n=10)
geomean                          11.68        7.362       -36.99%

No longer sorting the immutable set of label names drops CPU usage dramatically and some memory. There's also a further benefit to memory by calling proto.String only once for the variable label names.

The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.

Related to: #1702

@kgeckhart kgeckhart force-pushed the kgeckhart/improve-MakeLabelPairs-performance branch from 829ec00 to d049c34 Compare February 12, 2025 21:18
@kgeckhart kgeckhart marked this pull request as ready for review February 12, 2025 21:24
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brilliant, thank you! Elegant and clean.

I proposed using slice vs map for ordering to extra efficiency. We could also keep sorting on MakeLabelPairs if that is too expensive on NewDesc or memory. Speaking of NewDesc efficiency...

The tradeoff is a little bit more overhead in the Desc to store the full labelPairs and an index lookup map. I think this is a fair tradeoff to make because a single Desc instance is likely to be used for numerous calls to MakeLabelPairs.

Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric with dynamic (😱) NewDesc because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.

To sum up, LGTM with nits, slice suggestion, new benchmark and request to redo benchmarks for both MakeLabelPairs and NewDesc. Should be quick to do -- thank you so much for helping! 💪🏽

@kgeckhart
Copy link
Author

Generally it's fair I agree. However, we have many use cases of users using NewConst*Metric with dynamic (😱) NewDesc because the underlying metrics labels they report change. It's a niche and we could provide some optimizations for them later on (separate NewDesc, reusable Desc) but it would epic if we could measure NewDesc efficiency, at least to be aware of the overhead. I proposed a quick way for us to expand your benchmark for it.

Great point, funny enough yet-another-cloudwatch-exporter was one such use case and refactoring to a reusable Desc approach led me here next. Thanks for all the feedback I'll clean it up and add a bench for Desc as well.

…nd add a benchmark for NewDesc

Signed-off-by: Kyle Eckhart <[email protected]>
@kgeckhart kgeckhart force-pushed the kgeckhart/improve-MakeLabelPairs-performance branch from 4dc5c04 to 6da4db6 Compare February 19, 2025 22:24
@kgeckhart
Copy link
Author

After doing some more benchmarking I think I'll try to find some further optimizations in NewDesc. The benchmark showed increases across the board as expected,

goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Apple M3 Pro
                     │ baseline.txt │              new.txt               │
                     │    sec/op    │   sec/op     vs base               │
NewDesc/labels=1-11    362.1n ±  4%   412.7n ± 6%  +13.94% (p=0.002 n=6)
NewDesc/labels=3-11    778.1n ±  1%   930.1n ± 8%  +19.53% (p=0.002 n=6)
NewDesc/labels=10-11   3.077µ ± 21%   3.834µ ± 1%  +24.60% (p=0.002 n=6)
geomean                953.6n         1.137µ       +19.28%

                     │ baseline.txt │               new.txt               │
                     │     B/op     │     B/op      vs base               │
NewDesc/labels=1-11      376.0 ± 0%     504.0 ± 0%  +34.04% (p=0.002 n=6)
NewDesc/labels=3-11      744.0 ± 0%    1064.0 ± 0%  +43.01% (p=0.002 n=6)
NewDesc/labels=10-11   3.520Ki ± 0%   4.488Ki ± 0%  +27.52% (p=0.002 n=6)
geomean                 1002.7        1.319Ki       +34.71%

                     │ baseline.txt │              new.txt              │
                     │  allocs/op   │ allocs/op   vs base               │
NewDesc/labels=1-11      12.00 ± 0%   15.00 ± 0%  +25.00% (p=0.002 n=6)
NewDesc/labels=3-11      20.00 ± 0%   27.00 ± 0%  +35.00% (p=0.002 n=6)
NewDesc/labels=10-11     51.00 ± 0%   72.00 ± 0%  +41.18% (p=0.002 n=6)
geomean                  23.05        30.78       +33.56%

The results from MakeLabelPairs didn't change much at all. It was really hard to quantify if the increases in NewDesc were offset by savings in MakeLabelPairs so I added another benchmark to simulate a more real world scenario of NewDesc -> NewConstMetric. I re-organized the results to be number of metrics created, labels with -row

goos: darwin
goarch: arm64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Apple M3 Pro
        │ const-baseline.txt │            const-new.txt            │
        │       sec/op       │    sec/op     vs base               │
1 1              528.3n ± 1%   523.3n ±  1%        ~ (p=0.058 n=6)
1 3              1.100µ ± 1%   1.128µ ±  1%   +2.55% (p=0.002 n=6)
1 10             4.053µ ± 4%   4.356µ ±  1%   +7.48% (p=0.002 n=6)
2 1              676.7n ± 1%   653.2n ±  3%   -3.47% (p=0.002 n=6)
2 3              1.449µ ± 0%   1.325µ ±  2%   -8.59% (p=0.002 n=6)
2 10             5.062µ ± 0%   4.850µ ±  1%   -4.18% (p=0.002 n=6)
3 1              854.4n ± 4%   767.6n ±  2%  -10.15% (p=0.002 n=6)
3 3              1.767µ ± 1%   1.536µ ± 14%  -13.10% (p=0.002 n=6)
3 10             6.066µ ± 0%   5.336µ ±  3%  -12.03% (p=0.002 n=6)
5 1              1.176µ ± 1%   1.002µ ±  2%  -14.81% (p=0.002 n=6)
5 3              2.433µ ± 1%   1.983µ ±  2%  -18.50% (p=0.002 n=6)
5 10             8.022µ ± 4%   6.240µ ±  0%  -22.22% (p=0.002 n=6)
geomean          1.916µ        1.753µ         -8.54%

        │ const-baseline.txt │            const-new.txt            │
        │        B/op        │     B/op      vs base               │
1 1               696.0 ± 0%     784.0 ± 0%  +12.64% (p=0.002 n=6)
1 3             1.258Ki ± 0%   1.500Ki ± 0%  +19.25% (p=0.002 n=6)
1 10            4.816Ki ± 0%   5.605Ki ± 0%  +16.38% (p=0.002 n=6)
2 1              1016.0 ± 0%    1064.0 ± 0%   +4.72% (p=0.002 n=6)
2 3             1.789Ki ± 0%   1.961Ki ± 0%   +9.61% (p=0.002 n=6)
2 10            6.113Ki ± 0%   6.723Ki ± 0%   +9.97% (p=0.002 n=6)
3 1             1.305Ki ± 0%   1.312Ki ± 0%   +0.60% (p=0.002 n=6)
3 3             2.320Ki ± 0%   2.422Ki ± 0%   +4.38% (p=0.002 n=6)
3 10            7.410Ki ± 0%   7.840Ki ± 0%   +5.80% (p=0.002 n=6)
5 1             1.930Ki ± 0%   1.859Ki ± 0%   -3.64% (p=0.002 n=6)
5 3             3.383Ki ± 0%   3.344Ki ± 0%   -1.15% (p=0.002 n=6)
5 10            10.00Ki ± 0%   10.07Ki ± 0%   +0.70% (p=0.002 n=6)
geomean         2.520Ki        2.681Ki        +6.39%

        │ const-baseline.txt │           const-new.txt           │
        │     allocs/op      │ allocs/op   vs base               │
1 1               21.00 ± 0%   22.00 ± 0%   +4.76% (p=0.002 n=6)
1 3               35.00 ± 0%   38.00 ± 0%   +8.57% (p=0.002 n=6)
1 10              87.00 ± 0%   97.00 ± 0%  +11.49% (p=0.002 n=6)
2 1               30.00 ± 0%   29.00 ± 0%   -3.33% (p=0.002 n=6)
2 3               50.00 ± 0%   49.00 ± 0%   -2.00% (p=0.002 n=6)
2 10              123.0 ± 0%   122.0 ± 0%   -0.81% (p=0.002 n=6)
3 1               39.00 ± 0%   36.00 ± 0%   -7.69% (p=0.002 n=6)
3 3               65.00 ± 0%   60.00 ± 0%   -7.69% (p=0.002 n=6)
3 10              159.0 ± 0%   147.0 ± 0%   -7.55% (p=0.002 n=6)
5 1               57.00 ± 0%   50.00 ± 0%  -12.28% (p=0.002 n=6)
5 3               95.00 ± 0%   82.00 ± 0%  -13.68% (p=0.002 n=6)
5 10              231.0 ± 0%   197.0 ± 0%  -14.72% (p=0.002 n=6)
geomean           65.24        62.58        -4.09%

CPU and allocs/op increase if you only use the desc to create a single metric but beyond that you get savings. B/op only starts to show savings when you get to 5 metrics created 😬 . I'll take a deeper look tomorrow but my first inclination is that it's from this block https://github.com/kgeckhart/client_golang/blob/6da4db6dc2f8348a455b3bca5f657b47a3c02b4c/prometheus/desc.go#L174-L178 because I can't re-use the pre-created dto.LabelPair and have to create a new one in MakeLabelPairs

@bwplotka
Copy link
Member

Thanks!

Naive math on how much NewDesc is worse vs MakeLabelPairs is better, easily pays of on CPU, but it is interesting on the alloc side and this is what we see on your constMetricFlow bench.

NewDesc/labels=1-11    362.1n ±  4%   412.7n ± 6%  +13.94% (p=0.002 n=6)

vs

_MakeLabelPairs/1_label-11      94.25n ± 1%   47.06n ±  1%  -50.07% (p=0.000 n=10)
NewDesc/labels=1-11      376.0 ± 0%     504.0 ± 0%  +34.04% (p=0.002 n=6)

vs 

_MakeLabelPairs/1_label-11      136.00 ± 0%   96.00 ± 0%  -29.41% (p=0.000 n=10)

So indeed I would focus on allocs if NewDesc...

}
}

func BenchmarkConstMetricFlow(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for exporters, but if you go this route I would also add benchmark to show the benefit of "normal" use (e.g. NewCounter and WithLabelValues, which is 80% of use). This will perhaps motivate us to accept certain penalty for this niche cases.

For exporters we really need to redesign something around constant metrics. They are too expensive to be created on every scrape, especially if users (majority) expects predictable memory and CPU from k8s containers. I tried in the past to build some dto based cache layer google/cadvisor#2974 but it needs more work.

What I'm saying is that we could make this a bit slower and work with exporters on a long term plan. We could also give option to users (faster NewDesc vs slower one and more work later). Let's check what we could squeeze more.

Would some smart "pre alloc pool" help? (for predictive memory I believe yes)

@bwplotka
Copy link
Member

OK I spend some time to see if trying to get rid of dto.LabelPair pointers is beneficial. Feel free to cherry-pick the PR commits or ignore (#1746). In the description I added benchmark results. Not ideal but getting in good direction if we continue moving dto.Metric/LabelPair use out of client_golang. But this is ofc a bigger work.

} {
b.Run(fmt.Sprintf("labels=%v", len(bm.labelValues)), func(b *testing.B) {
for _, metricsToCreate := range []int{1, 2, 3, 5} {
b.Run(fmt.Sprintf("metrics=%v", metricsToCreate), func(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to add b.ResetTimer() in all our new benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants