[PROF-12763] Tweak profiler signal handler tests to avoid flakiness #4992

ivoanjo · 2025-10-20T09:57:20Z

What does this PR do?

This PR tweaks the "sample from the signal handler" profiler specs to run with GC disabled.

To avoid accidentally running out of memory in loop_until, we additionally changed two things:

Use Process.clock_gettime (that returns a zero-allocation float) instead of Time.now (that returns a new Time instance)
Add a check_condition_every_seconds to avoid calling the condition checking code too often, as it can do allocations

Motivation:

This change is an attempt at avoiding the flakiness seen in https://github.com/DataDog/dd-trace-rb/actions/runs/18560146732/job/52906922299:

Failures:

  1) Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start sampling from signal handler when signal handler sampling is disabled does not prepare samples in the signal handler
     Failure/Error: expect(sample_count).to be_within(20).percent_of(signal_handler_enqueued_sample)
       expected 15 to be within 20% of 20
     # ./spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb:1053:in 'block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:272:in 'block (2 levels) in <top (required)>'
     # ./spec/spec_helper.rb:154:in 'block (2 levels) in <top (required)>'
     # /usr/local/bundle/gems/webmock-3.24.0/lib/webmock/rspec.rb:39:in 'block (2 levels) in <top (required)>'
     # /usr/local/bundle/gems/rspec-wait-0.0.10/lib/rspec/wait.rb:47:in 'block (2 levels) in <top (required)>'
     # ./spec/support/execute_in_fork.rb:32:in 'ForkableExample#run'

Finished in 27.13 seconds (files took 1.14 seconds to load)
787 examples, 1 failure, 19 pending

Failed examples:

rspec ./spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb:1068 # Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start sampling from signal handler when signal handler sampling is disabled does not prepare samples in the signal handler

The assertion that failed checks that the number of times the profiler actually got to sample is close to the number of times it tried to sample.

Unfortunately, I could not reproduce the issue locally but I suspect that what made this condition flaky is GC: the profiler is not able to sample during GC and thus if a long-enough-running GC happens at the wrong time, this condition could fail.

I am suspicious that GC is the source also because loop_until caused lots of allocations previously.

Hopefully this was the source of the flakiness; if not, I'll come up with more ideas next time ;)

Change log entry

None.

Additional Notes:

Fixes https://github.com/DataDog/ruby-guild/issues/272

How to test the change?

If CI is green we're good! The logic on the test itself hasn't changed.

pr-commenter · 2025-10-20T10:26:24Z

Benchmarks

Benchmark execution time: 2025-10-21 09:02:10

Comparing candidate commit fa5db6f in PR branch ivoanjo/prof-12763-flaky-sighandler-spec with baseline commit 01568d4 in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - Allocations ()

🟥 throughput [-267993.421op/s; -260665.612op/s] or [-8.020%; -7.800%]

spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb

datadog-datadog-prod-us1 · 2025-10-21T07:46:40Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
• Patch Coverage: 100.00%
• Total Coverage: 98.58% (+0.03%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: fa5db6f | Docs | Was this helpful? Give us feedback!}

**What does this PR do?** This PR tweaks the "sample from the signal handler" profiler specs to run with GC disabled. To avoid accidentally running out of memory in `loop_until`, we additionally changed two things: * Use `Process.clock_gettime` (that returns a zero-allocation float) instead of `Time.now` (that returns a new `Time` instance) * Add a `check_condition_every_seconds` to avoid calling the condition checking code too often, as it can do allocations **Motivation:** This change is an attempt at avoiding the flakiness seen in <https://github.com/DataDog/dd-trace-rb/actions/runs/18560146732/job/52906922299>: ``` Failures: 1) Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start sampling from signal handler when signal handler sampling is disabled does not prepare samples in the signal handler Failure/Error: expect(sample_count).to be_within(20).percent_of(signal_handler_enqueued_sample) expected 15 to be within 20% of 20 # ./spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb:1053:in 'block (4 levels) in <top (required)>' # ./spec/spec_helper.rb:272:in 'block (2 levels) in <top (required)>' # ./spec/spec_helper.rb:154:in 'block (2 levels) in <top (required)>' # /usr/local/bundle/gems/webmock-3.24.0/lib/webmock/rspec.rb:39:in 'block (2 levels) in <top (required)>' # /usr/local/bundle/gems/rspec-wait-0.0.10/lib/rspec/wait.rb:47:in 'block (2 levels) in <top (required)>' # ./spec/support/execute_in_fork.rb:32:in 'ForkableExample#run' Finished in 27.13 seconds (files took 1.14 seconds to load) 787 examples, 1 failure, 19 pending Failed examples: rspec ./spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb:1068 # Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start sampling from signal handler when signal handler sampling is disabled does not prepare samples in the signal handler ``` The assertion that failed checks that the number of times the profiler actually got to sample is close to the number of times it tried to sample. Unfortunately, I could not reproduce the issue locally but I suspect that what made this condition flaky is GC: the profiler is not able to sample during GC and thus if a long-enough-running GC happens at the wrong time, this condition could fail. I am suspicious that GC is the source also because `loop_until` caused lots of allocations previously. Hopefully this was the source of the flakiness; if not, I'll come up with more ideas next time ;) **Additional Notes:** Fixes DataDog/ruby-guild#272 **How to test the change?** If CI is green we're good! The logic on the test itself hasn't changed.

ivoanjo requested review from a team as code owners October 20, 2025 09:57

github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Oct 20, 2025

AlexJF approved these changes Oct 20, 2025

View reviewed changes

marcotc approved these changes Oct 20, 2025

View reviewed changes

spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb Show resolved Hide resolved

spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb Show resolved Hide resolved

ivoanjo force-pushed the ivoanjo/prof-12763-flaky-sighandler-spec branch from b9ecb06 to fa5db6f Compare October 21, 2025 08:32

ivoanjo merged commit f8dfc57 into master Oct 21, 2025
425 of 426 checks passed

ivoanjo deleted the ivoanjo/prof-12763-flaky-sighandler-spec branch October 21, 2025 09:30

github-actions bot added this to the 2.23.0 milestone Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PROF-12763] Tweak profiler signal handler tests to avoid flakiness #4992

[PROF-12763] Tweak profiler signal handler tests to avoid flakiness #4992

Uh oh!

ivoanjo commented Oct 20, 2025

Uh oh!

pr-commenter bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PROF-12763] Tweak profiler signal handler tests to avoid flakiness #4992

[PROF-12763] Tweak profiler signal handler tests to avoid flakiness #4992

Uh oh!

Conversation

ivoanjo commented Oct 20, 2025

Uh oh!

pr-commenter bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:profiling - Allocations ()

Uh oh!

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pr-commenter bot commented Oct 20, 2025 •

edited

Loading

datadog-datadog-prod-us1 bot commented Oct 21, 2025 •

edited

Loading