-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci(benchmarks): add/enable startup benchmark to run in ci #12563
Conversation
|
It would be nice to simulate a real prod env. Looking at metabase, I see that many flask users also installed: For Django users: For FastAPI: for all users, most used: |
BenchmarksBenchmark execution time: 2025-03-04 14:08:14 Comparing candidate commit 09ee968 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 429 metrics, 2 unstable metrics. scenario:iast_aspects-ospathbasename_aspect
|
@christophe-papazian thinking more about this, we might just want to create dedicated tests for more realistic apps. I'd like to keep this benchmark as "micro" as we can, especially since we do have the Maybe a better follow-up is having a I think that and this benchmark could work together. e.g. if we see an increase in this benchmark, we'd expect to see an increase in the flask and django scenarios... or if we see an increase in the django startup benchmark, but not this one, then it helps narrow the problem. Not 100% sold in either direction, cool with iterating as a follow-up? |
Note the microbenchmark for this new benchmark is failing in CI because the scenario technically exists on the baseline, but it is missing a
I'll investigate, not enabling the benchmark in CI during this PR and doing that as a follow-up might be the better option. I'll keep as enabled in CI for now so any iteration we can validate that it will execute properly on the candidate branch. |
This is added in gaogaotiantian/viztracer#562. Not sure when the next release will be but the feature will be there. |
hey @gaogaotiantian ! thanks for responding, and thanks for |
No problem! Let me know if you need anything from viztracer :) |
Datadog ReportBranch report: ✅ 0 Failed, 43 Passed, 215 Skipped, 55.93s Total duration (4m 51.98s time saved) |
This finishes up the work from #12563 and enables the benchmarks to run. They were failing before because the scenario technically existed on the `main` branch, but not in a state where it could run successfully, so the benchmark scenario would always fail. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
We had a previous half-finished
startup
benchmark. This PR cleans it up, adds a config, and enables it in CI.This test uses
subprocess.check_call()
to test the time it takes to start and tear down a Python process with the given options/commands.Benchmark summary
tl;dr; "How long does it take to get the user's original program started when dd-trace-py is added."
Benchmark designed to test the amount of overhead added to a process start time when loading/initializing
dd-trace-py
.There are variants which also include importing a popular library/integration (
flask
) to help evaluate the impact on import/startup overhead when loading that library.This benchmark can/should be updated to include other variants to validate certain features/products being enabled/disabled and the impact on startup overhead.
Notes:
PROFILE_BENCHMARKS=1
here.viztracer
supports tracing into subprocesses, but it will only inject into processes which start withpython
, so theddtrace-run
variants won't get traced. I've also had mixed results depending on the variant, but it is technically possible, and worth trying/playing with.viztracer python -c "import ddtrace.auto"
is faster/easier way to get profiling data for a given variant.env
config option for future use, we are not currently using it. I imagined it could be use for product/feature enablement/disablement.Variants
python -c ""
python -c "import flask"
ddtrace-run python -c ""
ddtrace-run python -c "import flask"
python -c "import ddtrace"
python -c "import ddtrace; import flask"
python -c "import ddtrace.auto"
python -c "import ddtrace.auto; import flask"
ddtrace-run python -c "import ddtrace.auto"
ddtrace-run python -c "import ddtrace.auto; import flask"
Results
Comparing
2.21.3
against3.1.0
.The following results were run locally from my laptop with
./scripts/perf-run-scenario startup "ddtrace~=2.21" "ddtrace~=3.0"
.2.21.3 raw results
3.1.0 raw results
Comparison results
Benchmark hidden because not significant (1): startup-baseline
Analysis
baseline-flask
. This likely means this benchmark has a potentially high level of variance to it, rather than pointing to 3.x actually being slower.baseline_flask
being consistently slower really confuses me.import flask
to have a consistent overhead when comparingbaseline_flask
againstimport_ddtrace_flask
. It is faster toimport flask
onceddtrace
has already been imported (but no patching). My hypothesis here is that we have some shared dependencies (likeimportlib.metadata
for example), andddtrace
is eating the overhead cost of importing these deps when loading thatimport flask
can benefit from later.flask
down to a consistent 50 ms.baseline_flask
(68.2 ms) -baseline
(5.1 ms) = 63 ms toimport flask
import_ddtrace_auto_flask
(178 ms) -import_ddtrace_auto
(128 ms) = 50 ms toimport flask
ddtrace_run_flask
(244 ms) -ddtrace_run
(194 ms) = 50 ms toimport flask
ddtrace_run_import_ddtrace_auto
to have the same overhead asimport_ddtrace_auto
variant, however onlyimport ddtrace.auto
is faster, and this is likely becauseddtrace-run
does some work, and then starts a subprocess and then basically doesimport ddtrace.auto
. Comparingddtrace_run
againstddtrace_run_import_ddtrace_auto
shows the same overhead, which makes sense.Next steps
Checklist
Reviewer Checklist