perf: cache metrics handler WithTags and Tagged scope lookups#9620
Open
mykaul wants to merge 2 commits intotemporalio:mainfrom
Open
perf: cache metrics handler WithTags and Tagged scope lookups#9620mykaul wants to merge 2 commits intotemporalio:mainfrom
mykaul wants to merge 2 commits intotemporalio:mainfrom
Conversation
…tions Add sync.Map-based caching of child handlers in WithTags(). On cache hit, zero allocations — skips tagsToMap(), scope.Tagged(), and handler struct allocation entirely. Allocation reduction (pprof alloc_space, 5min ScyllaDB workload): WithTags cumulative: 1,930 MB -> 316 MB (-83.6%) Total server allocs: 18,030 MB -> 16,481 MB (-8.6%) Benchmark (omes throughput_stress, mc150, 5 min, host networking, i7-1270P 4 cores/component, inter-run data resets): Cassandra: 294 iterations (+5.0% vs 280 baseline, +13.5% vs prev) ScyllaDB: 296 iterations (+2.1% vs 290 baseline, -1.3% vs prev)
…ocations Add a scopeCache (sync.Map) to tallyMetricsHandler that caches the result of scope.Tagged() calls per unique tag combination. This avoids repeated map allocations and tally scope registry lookups on every Counter/Gauge/Timer/Histogram Record call with inline tags. The scope cache builds on the WithTags handler cache from the previous commit: WithTags caches entire handler subtrees, while scopeCache targets the per-metric-emission path where tags are passed inline. Cache bounded to 1024 entries. Tags normalized through excludeTags before key computation. Combined allocation reduction (pprof alloc_space, 5min ScyllaDB): tagsToMap.func1: 1,101 MB -> 0 MB (-100%) tally Subscope: 1,012 MB -> 0 MB (-100%) Total server: 18,465 MB -> 16,511 MB (-10.6%) Benchmark (omes throughput_stress, mc150, 5 min, host networking, i7-1270P 4 cores/component, inter-run data resets): Cassandra: 270 iterations (-3.6% vs 280 baseline, -8.2% vs prev) ScyllaDB: 298 iterations (+2.8% vs 290 baseline, +0.7% vs prev) Note: Throughput variance at mc150 is ~5-10%. The allocation reduction is confirmed by pprof but throughput gains are within noise at this concurrency level.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WithTagshandler instances intallyMetricsHandlerto eliminate repeated heap allocations on every metrics emission call.Taggedscope lookups intallyMetricsHandlerto avoid redundant tally scope creation for repeated tag combinations.Motivation
Profiling under load shows
WithTagsand the underlying tallyTaggedcalls are among the top allocation sources in the Temporal server hot path. Every metrics emission (counters, timers, histograms) creates new handler and scope objects for the same tag combinations repeatedly.Approach
Tagged()scope lookup layer, which is the next allocation hotspot after the handler layer is cached.Benchmark Results (Cassandra 5.0, throughput_stress scenario)
The WithTags cache alone provides the strongest signal. The Tagged scope cache adds complexity but the composition effect is smaller than expected — possibly due to the WithTags cache already eliminating most redundant scope creations.
Testing
-race.