Skip to content

feat(perf-issues): Allow MN+1 experiment to scan for prisma client occurrences #91566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 14, 2025

Conversation

leeandher
Copy link
Member

@leeandher leeandher commented May 13, 2025

So how it works:

  • Increase the pattern length from 5 to 8
  • Add configuration for minimum DB percentage, allows us to drop it from 10% to 5% (Prisma has HTTP calls interspersed which compete for time and may prevent issues)
  • Modify the constraint for a single parent node for all offenders (this is the complex bit)
    • While iterating through spans, we track each, and their parent in a dictionary
    • When we finish the scan for an MN+1 and want a parent, refer back to this dictionary
    • Depending on a depth setting (defaulting to 3 for now), assemble lists for each offender with their parent lineage
      • For example, the list of some span will be [parent, grandparent, great-grandparent] when depth is set to 3. This is necessary for Prisma since DB spans produce the following for prisma traces (with their built in OTEL spans)
        [prisma.engine, prisma.client.operation, <user-callsite>]
    • Then we convert each list to a set and get the global intersection across them all
      • This will give us shared parent spans up to a depth of the max_allowable_depth setting.
      • This span also needs to be in the event[spans] list, so it'll exclude a root span
      • If this has more than one result, use the list to get the first match, and thus the earliest shared parent span, for better fingerprinting.

Added some metrics as well to see when the detector bails out, and tested for those metrics as well to make sure we're using the correct code paths. The JSON file for the stubbed trace is the same one here though you may need to be a superuser to see that as its on my personal test organization.

Also added an option to meter this rollout to 25% instead of full GA since the added depth scanning will likely affect the detectors performance, I'll increase it in 25% increments and keep an eye out in case it does

@leeandher leeandher requested a review from a team as a code owner May 13, 2025 18:36
@leeandher leeandher requested a review from a team May 13, 2025 18:36
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 13, 2025
Copy link

codecov bot commented May 13, 2025

Codecov Report

Attention: Patch coverage is 90.38462% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ssues/experiments/test_m_n_plus_one_db_detector.py 84.61% 8 Missing ⚠️
...ectors/experiments/mn_plus_one_db_span_detector.py 96.07% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #91566      +/-   ##
==========================================
- Coverage   87.60%   83.99%   -3.61%     
==========================================
  Files       10323    10331       +8     
  Lines      585926   586314     +388     
  Branches    22532    22532              
==========================================
- Hits       513296   492479   -20817     
- Misses      72209    93414   +21205     
  Partials      421      421              

Copy link
Member

@gggritso gggritso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Someone who's good at graphs/trees might know of a more efficient way to check the common parent, but I do not.

  1. I wonder if it'd be simpler to construct the parent lookup in the visit method, instead of having to copy the logic into both states and pass the data around
  2. I think you've made enough changes to warrant updating the code comments (and maybe adding some more?)

That's minor though. 🚢

@leeandher
Copy link
Member Author

Made some perf improvements for the scanning algorithm (sentry performance timeit -d MNPlusOneDBSpanExperimentalDetector [testfile] -n 100000)

  • Before: 0.2953734654199798 ms, averaged across 100k runs.
  • After: 0.27164446583992685 ms averaged across 100k runs.

Which is like a 9% improvement!

@leeandher leeandher merged commit cc94c6f into master May 14, 2025
59 checks passed
@leeandher leeandher deleted the leander/id-618-apply-changes-for-mn1-db-experiment branch May 14, 2025 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants