Skip to content

perf: cache metrics handler WithTags and Tagged scope lookups#9620

Open
mykaul wants to merge 2 commits intotemporalio:mainfrom
mykaul:perf/cache-metrics-handler-tags
Open

perf: cache metrics handler WithTags and Tagged scope lookups#9620
mykaul wants to merge 2 commits intotemporalio:mainfrom
mykaul:perf/cache-metrics-handler-tags

Conversation

@mykaul
Copy link
Contributor

@mykaul mykaul commented Mar 23, 2026

Summary

  • Cache WithTags handler instances in tallyMetricsHandler to eliminate repeated heap allocations on every metrics emission call.
  • Cache Tagged scope lookups in tallyMetricsHandler to avoid redundant tally scope creation for repeated tag combinations.

Motivation

Profiling under load shows WithTags and the underlying tally Tagged calls are among the top allocation sources in the Temporal server hot path. Every metrics emission (counters, timers, histograms) creates new handler and scope objects for the same tag combinations repeatedly.

Approach

  • Commit 1 (WithTags cache): Adds a bounded LRU-style concurrent cache keyed by a canonical string representation of the tag set. Cache hits return the previously-constructed handler, avoiding allocation of new handler objects, tag slices, and scope wrappers.
  • Commit 2 (Tagged scope cache): Extends the caching to the tally Tagged() scope lookup layer, which is the next allocation hotspot after the handler layer is cached.

Benchmark Results (Cassandra 5.0, throughput_stress scenario)

Configuration Iterations (5min) vs Baseline
Baseline (3 samples) 1023, 1025, 1032 (mean 1027) -
WithTags cache only (2 samples) 1110, 1058 (mean 1084) +5.6%
WithTags + Tagged scope cache (2 samples) 1067, 1014 (mean 1041) +1.4%

The WithTags cache alone provides the strongest signal. The Tagged scope cache adds complexity but the composition effect is smaller than expected — possibly due to the WithTags cache already eliminating most redundant scope creations.

Testing

  • All existing metrics tests pass with -race.
  • Both commits include unit tests for the caching behavior.

mykaul added 2 commits March 23, 2026 09:56
…tions

Add sync.Map-based caching of child handlers in WithTags(). On cache
hit, zero allocations — skips tagsToMap(), scope.Tagged(), and handler
struct allocation entirely.

Allocation reduction (pprof alloc_space, 5min ScyllaDB workload):
  WithTags cumulative: 1,930 MB -> 316 MB (-83.6%)
  Total server allocs: 18,030 MB -> 16,481 MB (-8.6%)

Benchmark (omes throughput_stress, mc150, 5 min, host networking,
i7-1270P 4 cores/component, inter-run data resets):
  Cassandra: 294 iterations (+5.0% vs 280 baseline, +13.5% vs prev)
  ScyllaDB:  296 iterations (+2.1% vs 290 baseline, -1.3% vs prev)
…ocations

Add a scopeCache (sync.Map) to tallyMetricsHandler that caches the
result of scope.Tagged() calls per unique tag combination. This avoids
repeated map allocations and tally scope registry lookups on every
Counter/Gauge/Timer/Histogram Record call with inline tags.

The scope cache builds on the WithTags handler cache from the previous
commit: WithTags caches entire handler subtrees, while scopeCache
targets the per-metric-emission path where tags are passed inline.
Cache bounded to 1024 entries. Tags normalized through excludeTags
before key computation.

Combined allocation reduction (pprof alloc_space, 5min ScyllaDB):
  tagsToMap.func1:  1,101 MB -> 0 MB (-100%)
  tally Subscope:   1,012 MB -> 0 MB (-100%)
  Total server:    18,465 MB -> 16,511 MB (-10.6%)

Benchmark (omes throughput_stress, mc150, 5 min, host networking,
i7-1270P 4 cores/component, inter-run data resets):
  Cassandra: 270 iterations (-3.6% vs 280 baseline, -8.2% vs prev)
  ScyllaDB:  298 iterations (+2.8% vs 290 baseline, +0.7% vs prev)
  Note: Throughput variance at mc150 is ~5-10%. The allocation
  reduction is confirmed by pprof but throughput gains are within
  noise at this concurrency level.
@mykaul mykaul requested review from a team as code owners March 23, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant