Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.14] TBS: Optimize for ReadTraceEvents miss (backport #13464) #13474

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jun 24, 2024

Motivation/summary

When a sampling decision is received from a remote apm-server, if the trace is not stored locally, prefetching values in the iterator is causing mmap vlog to be read for no value, resulting in high memory usage and disk read IO.

Add an initial pass with PrefetchValues=false to optimize for a miss in ReadTraceEvents.

e.g. if there are 10 apm-servers all running TBS and synced via ES, and all transactions/spans from a trace are always sent to in 1 apm-server (i.e., a single trace never stored across multiple apm-servers), this PR should cut disk read IO related to sampling decision handling to 1/10 of before.

Benchmark results

Since the benchmark runs badger DB with in-memory mode, I expect in reality, hit=false after this fix will stay the same, while all the other numbers to be 1 order of magnitude slower. i.e. the performance gain from this PR will be even greater.

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage
cpu: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
                                       │ bench-read-events-hit.before │     bench-read-events-hit.after     │
                                       │            sec/op            │    sec/op     vs base               │
ReadEventsHit/bigTX=true/hit=false-16                   167.06µ ± 14%   14.31µ ± 32%  -91.43% (p=0.002 n=6)
ReadEventsHit/bigTX=true/hit=true-16                     174.4µ ±  4%   186.3µ ±  4%   +6.81% (p=0.004 n=6)
ReadEventsHit/bigTX=false/hit=false-16                  129.49µ ±  3%   16.93µ ± 21%  -86.92% (p=0.002 n=6)
ReadEventsHit/bigTX=false/hit=true-16                    136.5µ ±  3%   139.8µ ±  3%   +2.40% (p=0.026 n=6)
geomean                                                  150.7µ         50.13µ        -66.73%

Checklist

- [ ] Update CHANGELOG.asciidoc Require backport release changelog
- [ ] Documentation has been updated

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

See benchmark

Related issues

Closes #13463


This is an automatic backport of pull request #13464 done by [Mergify](https://mergify.com).

When a sampling decision is received from a remote apm-server, if the trace is not stored locally, prefetching values in the iterator is causing mmap vlog to be read for no value, resulting in high memory usage and disk read IO.

Add an initial pass with PrefetchValues=false to optimize for a miss in ReadTraceEvents.

e.g. if there are 10 apm-servers all running TBS and synced via ES, and all transactions/spans from a trace are always sent to in 1 apm-server (i.e., a single trace never stored across multiple apm-servers), this PR should cut disk read IO related to sampling decision handling to 1/10 of before.

(cherry picked from commit 2558f92)
@mergify mergify bot added the backport label Jun 24, 2024
@mergify mergify bot merged commit 06fc881 into 8.14 Jun 24, 2024
12 checks passed
@mergify mergify bot deleted the mergify/bp/8.14/pr-13464 branch June 24, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant