ci(benchmarks): add/enable startup benchmark to run in ci #12563

brettlangdon · 2025-02-28T14:16:40Z

We had a previous half-finished startup benchmark. This PR cleans it up, adds a config, and enables it in CI.

This test uses subprocess.check_call() to test the time it takes to start and tear down a Python process with the given options/commands.

Benchmark summary

tl;dr; "How long does it take to get the user's original program started when dd-trace-py is added."

Benchmark designed to test the amount of overhead added to a process start time when loading/initializing dd-trace-py.

There are variants which also include importing a popular library/integration (flask) to help evaluate the impact on import/startup overhead when loading that library.

This benchmark can/should be updated to include other variants to validate certain features/products being enabled/disabled and the impact on startup overhead.

Notes:

We are also testing process teardown overhead as well since we start a process and wait for the process to exit in the total timing. This means if a variant generates any data to flush, then it may block shutdown waiting for that data to flush and may distort the reality of the overhead added.
We are, sort of, able to use PROFILE_BENCHMARKS=1 here. viztracer supports tracing into subprocesses, but it will only inject into processes which start with python, so the ddtrace-run variants won't get traced. I've also had mixed results depending on the variant, but it is technically possible, and worth trying/playing with.
- Generating profiles locally is probably much easier and much more consistent, e.g. viztracer python -c "import ddtrace.auto" is faster/easier way to get profiling data for a given variant.
- The profiles can get MASSIVE and then difficult to load, because we are starting a ton of subprocesses, and generating profiles for each of them. Another reason why running locally might be better/faster/etc.
We have added an env config option for future use, we are not currently using it. I imagined it could be use for product/feature enablement/disablement.

Variants

baseline: python -c ""
baseline_flask: python -c "import flask"
ddtrace_run: ddtrace-run python -c ""
ddtrace_run_flask: ddtrace-run python -c "import flask"
import_ddtrace: python -c "import ddtrace"
import_ddtrace_flask: python -c "import ddtrace; import flask"
import_ddtrace_auto: python -c "import ddtrace.auto"
import_ddtrace_auto_flask: python -c "import ddtrace.auto; import flask"
ddtrace_run_import_ddtrace_auto: ddtrace-run python -c "import ddtrace.auto"
ddtrace_run_import_ddtrace_auto_flask: ddtrace-run python -c "import ddtrace.auto; import flask"

Results

Comparing 2.21.3 against 3.1.0.

The following results were run locally from my laptop with ./scripts/perf-run-scenario startup "ddtrace~=2.21" "ddtrace~=3.0".

2.21.3 raw results

.....................
startup-baseline: Mean +- std dev: 5.07 ms +- 0.10 ms
.....................
startup-baseline_flask: Mean +- std dev: 65.2 ms +- 1.2 ms
.....................
startup-ddtrace_run: Mean +- std dev: 183 ms +- 3 ms
.....................
startup-ddtrace_run_flask: Mean +- std dev: 233 ms +- 3 ms
.....................
startup-import_ddtrace: Mean +- std dev: 66.0 ms +- 0.8 ms
.....................
startup-import_ddtrace_flask: Mean +- std dev: 111 ms +- 2 ms
.....................
startup-import_ddtrace_auto: Mean +- std dev: 127 ms +- 2 ms
.....................
startup-import_ddtrace_auto_flask: Mean +- std dev: 175 ms +- 4 ms
.....................
startup-ddtrace_run_import_ddtrace_auto: Mean +- std dev: 187 ms +- 2 ms
.....................
startup-ddtrace_run_import_ddtrace_auto_flask: Mean +- std dev: 236 ms +- 3 ms

3.1.0 raw results

.....................
startup-baseline: Mean +- std dev: 5.10 ms +- 0.09 ms
.....................
startup-baseline_flask: Mean +- std dev: 68.2 ms +- 1.7 ms
.....................
startup-ddtrace_run: Mean +- std dev: 194 ms +- 4 ms
.....................
startup-ddtrace_run_flask: Mean +- std dev: 244 ms +- 4 ms
.....................
startup-import_ddtrace: Mean +- std dev: 72.0 ms +- 1.2 ms
.....................
startup-import_ddtrace_flask: Mean +- std dev: 119 ms +- 4 ms
.....................
startup-import_ddtrace_auto: Mean +- std dev: 128 ms +- 2 ms
.....................
startup-import_ddtrace_auto_flask: Mean +- std dev: 178 ms +- 4 ms
.....................
startup-ddtrace_run_import_ddtrace_auto: Mean +- std dev: 193 ms +- 3 ms
.....................
startup-ddtrace_run_import_ddtrace_auto_flask: Mean +- std dev: 245 ms +- 6 ms

Comparison results

Benchmark	startup/2.21.3/results.json	startup/3.1.0/results.json
startup-baseline_flask	65.2 ms	68.2 ms: 1.05x slower
startup-ddtrace_run	183 ms	194 ms: 1.06x slower
startup-ddtrace_run_flask	233 ms	244 ms: 1.05x slower
startup-import_ddtrace	66.0 ms	72.0 ms: 1.09x slower
startup-import_ddtrace_flask	111 ms	119 ms: 1.07x slower
startup-import_ddtrace_auto	127 ms	128 ms: 1.01x slower
startup-import_ddtrace_auto_flask	175 ms	178 ms: 1.02x slower
startup-ddtrace_run_import_ddtrace_auto	187 ms	193 ms: 1.03x slower
startup-ddtrace_run_import_ddtrace_auto_flask	236 ms	245 ms: 1.04x slower
Geometric mean	(ref)	1.04x slower

Benchmark hidden because not significant (1): startup-baseline

Analysis

The overhead added for the 3.x variant is pretty consistent across all variants, including the baseline-flask. This likely means this benchmark has a potentially high level of variance to it, rather than pointing to 3.x actually being slower.
1. Note: If this benchmark does have a high level of variance it might be too noisy to have it run on every PR, or we might need to consider a different threshold to alert on a regression.
2. Running the benchmark a few times locally I am seeing the same slow down as consistent, but the baseline_flask being consistently slower really confuses me.
I was expecting import flask to have a consistent overhead when comparing baseline_flask against import_ddtrace_flask. It is faster to import flask once ddtrace has already been imported (but no patching). My hypothesis here is that we have some shared dependencies (like importlib.metadata for example), and ddtrace is eating the overhead cost of importing these deps when loading that import flask can benefit from later.
1. Goes from about 63 ms to import flask down to a consistent 50 ms.
2. baseline_flask (68.2 ms) - baseline (5.1 ms) = 63 ms to import flask
3. import_ddtrace_auto_flask (178 ms) - import_ddtrace_auto (128 ms) = 50 ms to import flask
4. ddtrace_run_flask (244 ms) - ddtrace_run (194 ms) = 50 ms to import flask
I was expecting ddtrace_run_import_ddtrace_auto to have the same overhead as import_ddtrace_auto variant, however only import ddtrace.auto is faster, and this is likely because ddtrace-run does some work, and then starts a subprocess and then basically does import ddtrace.auto. Comparing ddtrace_run against ddtrace_run_import_ddtrace_auto shows the same overhead, which makes sense.

Next steps

Will be interesting to see results on new PRs to this benchmark. We have to keep a close eye on if it is "flaky".
Would be interested to see products adding variants explicitly enabling/disabling their products/features to see startup impact of certain products.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2025-02-28T14:17:11Z

CODEOWNERS have been resolved as:

benchmarks/startup/config.yaml                                          @DataDog/apm-core-python
benchmarks/startup/requirements_scenario.txt                            @DataDog/apm-core-python
benchmarks/startup/scenario.py                                          @DataDog/apm-core-python

christophe-papazian · 2025-02-28T14:45:36Z

It would be nice to simulate a real prod env. Looking at metabase, I see that many flask users also installed:
flask_cors
flask_sqlalchemy
flask_caching
flask_jwt_extended
flask_babel
flask_migrate
flask_appbuilder
flask_limiter
flask_compress
flask_session

For Django users:
django-cors-headers
django-oauth-toolkit
django-storages
django-appconf
django-filter
django_extensions

For FastAPI:
fastapi_cli
fastapi.utils
fastapi_pagination
fastapi.cli
fastapi_utils

for all users, most used:
certifi
pytz
cryptography
deprecated
botocore
boto3
aiohappyeyeballs
aiohttp
starlette
propcache
sqlalchemy

pr-commenter · 2025-02-28T14:58:59Z

Benchmarks

Benchmark execution time: 2025-03-04 14:08:14

Comparing candidate commit 09ee968 in PR branch brettlangdon/benchmark.startup with baseline commit 0924eae in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 429 metrics, 2 unstable metrics.

scenario:iast_aspects-ospathbasename_aspect

🟩 execution_time [-318.218ns; -269.757ns] or [-8.852%; -7.504%]

…k.startup

brettlangdon · 2025-02-28T17:11:15Z

It would be nice to simulate a real prod env. Looking at metabase, I see that many flask users also installed: flask_cors flask_sqlalchemy flask_caching flask_jwt_extended flask_babel flask_migrate flask_appbuilder flask_limiter flask_compress flask_session

For Django users: django-cors-headers django-oauth-toolkit django-storages django-appconf django-filter django_extensions

For FastAPI: fastapi_cli fastapi.utils fastapi_pagination fastapi.cli fastapi_utils

for all users, most used: certifi pytz cryptography deprecated botocore boto3 aiohappyeyeballs aiohttp starlette propcache sqlalchemy

@christophe-papazian thinking more about this, we might just want to create dedicated tests for more realistic apps. I'd like to keep this benchmark as "micro" as we can, especially since we do have the flask_simple and django_simple benchmarks which do try to replicate an actual app (ok, not a lot of extensions/extras, but at least spin up django with an endpoint and hit it).

Maybe a better follow-up is having a flask_startup and django_startup type benchmarks instead to focus on the added overhead to various django set ups.

I think that and this benchmark could work together. e.g. if we see an increase in this benchmark, we'd expect to see an increase in the flask and django scenarios... or if we see an increase in the django startup benchmark, but not this one, then it helps narrow the problem.

Not 100% sold in either direction, cool with iterating as a follow-up?

brettlangdon · 2025-02-28T17:21:26Z

Note the microbenchmark for this new benchmark is failing in CI because the scenario technically exists on the baseline, but it is missing a config.yaml, so it just fails with that file missing.

FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml'

I'll investigate, not enabling the benchmark in CI during this PR and doing that as a follow-up might be the better option. I'll keep as enabled in CI for now so any iteration we can validate that it will execute properly on the candidate branch.

gaogaotiantian · 2025-02-28T17:49:17Z

viztracer supports tracing into subprocesses, but it will only inject into processes which start with python, so the ddtrace-run variants won't get traced.

This is added in gaogaotiantian/viztracer#562. Not sure when the next release will be but the feature will be there.

brettlangdon · 2025-02-28T17:57:06Z

viztracer supports tracing into subprocesses, but it will only inject into processes which start with python, so the ddtrace-run variants won't get traced.

This is added in gaogaotiantian/viztracer#562. Not sure when the next release will be but the feature will be there.

hey @gaogaotiantian ! thanks for responding, and thanks for viztracer !

gaogaotiantian · 2025-02-28T18:00:09Z

hey @gaogaotiantian ! thanks for responding, and thanks for viztracer !

No problem! Let me know if you need anything from viztracer :)

datadog-dd-trace-py-rkomorn · 2025-02-28T19:02:36Z

Datadog Report

Branch report: brettlangdon/benchmark.startup
Commit report: a02e37f
Test service: dd-trace-py

✅ 0 Failed, 43 Passed, 215 Skipped, 55.93s Total duration (4m 51.98s time saved)

This finishes up the work from #12563 and enables the benchmarks to run. They were failing before because the scenario technically existed on the `main` branch, but not in a state where it could run successfully, so the benchmark scenario would always fail. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

brettlangdon added 2 commits February 28, 2025 08:26

ci(benchmarks): add/enable startup benchmark to run in ci

48fa8df

fix env parsing

5e38de5

brettlangdon requested review from a team as code owners February 28, 2025 14:16

brettlangdon requested review from avara1986 and juanjux February 28, 2025 14:16

Merge remote-tracking branch 'origin/main' into brettlangdon/benchmar…

d4f8d01

…k.startup

emmettbutler approved these changes Feb 28, 2025

View reviewed changes

emmettbutler added the changelog/no-changelog A changelog entry is not required for this PR. label Feb 28, 2025

add django startup tests

7c343b8

Merge branch 'main' into brettlangdon/benchmark.startup

a02e37f

Merge branch 'main' into brettlangdon/benchmark.startup

09ee968

brettlangdon enabled auto-merge (squash) March 4, 2025 13:24

brettlangdon mentioned this pull request Mar 4, 2025

perf(internal): avoid early compat loading #12621

Merged

2 tasks

brettlangdon merged commit bcd4bf6 into main Mar 4, 2025
313 checks passed

brettlangdon deleted the brettlangdon/benchmark.startup branch March 4, 2025 14:09

brettlangdon mentioned this pull request Mar 4, 2025

ci(benchmarks): enable the startup benchmark scenario #12623

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(benchmarks): add/enable startup benchmark to run in ci #12563

ci(benchmarks): add/enable startup benchmark to run in ci #12563

brettlangdon commented Feb 28, 2025 •

edited by emmettbutler

Loading

github-actions bot commented Feb 28, 2025 •

edited

Loading

christophe-papazian commented Feb 28, 2025

pr-commenter bot commented Feb 28, 2025 •

edited

Loading

brettlangdon commented Feb 28, 2025

brettlangdon commented Feb 28, 2025

gaogaotiantian commented Feb 28, 2025

brettlangdon commented Feb 28, 2025

gaogaotiantian commented Feb 28, 2025

datadog-dd-trace-py-rkomorn bot commented Feb 28, 2025

ci(benchmarks): add/enable startup benchmark to run in ci #12563

ci(benchmarks): add/enable startup benchmark to run in ci #12563

Conversation

brettlangdon commented Feb 28, 2025 • edited by emmettbutler Loading

Benchmark summary

Variants

Results

Analysis

Next steps

Checklist

Reviewer Checklist

github-actions bot commented Feb 28, 2025 • edited Loading

christophe-papazian commented Feb 28, 2025

pr-commenter bot commented Feb 28, 2025 • edited Loading

Benchmarks

scenario:iast_aspects-ospathbasename_aspect

brettlangdon commented Feb 28, 2025

brettlangdon commented Feb 28, 2025

gaogaotiantian commented Feb 28, 2025

brettlangdon commented Feb 28, 2025

gaogaotiantian commented Feb 28, 2025

datadog-dd-trace-py-rkomorn bot commented Feb 28, 2025

Datadog Report

brettlangdon commented Feb 28, 2025 •

edited by emmettbutler

Loading

github-actions bot commented Feb 28, 2025 •

edited

Loading

pr-commenter bot commented Feb 28, 2025 •

edited

Loading