Skip to content

fix(tsan): suppress false-positive race from VA reuse in LinearAllocator#593

Merged
jbachorik merged 4 commits into
mainfrom
fix/tsan-linear-alloc-va-reuse
Jun 12, 2026
Merged

fix(tsan): suppress false-positive race from VA reuse in LinearAllocator#593
jbachorik merged 4 commits into
mainfrom
fix/tsan-linear-alloc-va-reuse

Conversation

@jbachorik

@jbachorik jbachorik commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?:
Re-map each freshly-allocated LinearAllocator chunk's virtual address through the libc mmap() wrapper with MAP_FIXED (TSan builds only) in LinearAllocator::allocateChunk, so TSan's interceptor clears stale shadow state left over from virtual-address reuse after mmap/munmap.

Motivation:
The stress test ConcurrentExpansionAndCollectStressTest surfaced a TSan data race report at linearAllocator.cpp:237 (chunk->prev = current). The race is a false positive caused by OS VA reuse: when OS::safeFree (munmap) frees a chunk and OS::safeAlloc (mmap) later returns the same VA to a different allocator, TSan's shadow memory for that VA still holds the old access history.

OS::safeAlloc uses a raw mmap syscall that bypasses TSan's interceptors by design. TSan's interceptors normally call MemoryRangeImitateWrite on mapped ranges, setting all 4 shadow slots for the range to the current thread's write and erasing stale entries; without interception, stale shadow persists across VA reuse → false-positive races.

An earlier attempt used __tsan_acquire(chunk), but that only establishes happens-before at the header address (VA+0), not user-data addresses (VA+n), and __tsan_reset_range does not exist in the TSan API. The working fix re-maps the same VA through the libc mmap() wrapper with MAP_FIXED after the raw safeAlloc; TSan intercepts that call and clears the stale shadow. OS::safeAlloc itself is unchanged, so the production allocation path does not diverge.

The __tsan_release(chunk) after writing the header is retained so that freeChunks/detachChunks' __tsan_acquire calls can establish happens-before when reading chunk->prev.

Additional Notes:
The mmap(MAP_FIXED) re-map and TSan interceptor are not async-signal-safe, so the code is gated on TSAN_ENABLED and relies on the build applying -fsanitize=thread only to the isolated gtest binaries (never the JVM-loaded shared library). A failed MAP_FIXED aborts unconditionally (not via assert) so it cannot silently leave a hole under NDEBUG. Observed in the reliability/chaos CI run on PR #589; not related to that PR's changes.

How to test the change?:
Run the TSan build stress test: ./gradlew :ddprof-lib:testTsan --tests "*ConcurrentExpansionAndCollectStressTest*" — the reported race at linearAllocator.cpp:237 should no longer appear.

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: [JIRA-XXXX]

@jbachorik jbachorik marked this pull request as ready for review June 11, 2026 16:07
@jbachorik jbachorik requested a review from a team as a code owner June 11, 2026 16:07

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8eda1b6039

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ddprof-lib/src/main/cpp/linearAllocator.cpp Outdated
@dd-octo-sts

dd-octo-sts Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #27410334436 | Commit: 8cd666c | Duration: 12m 43s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-12 10:49:17 UTC

@jbachorik jbachorik force-pushed the fix/tsan-linear-alloc-va-reuse branch from 4edbf2b to 402ea86 Compare June 12, 2026 08:11
@datadog-official

This comment has been minimized.

@jbachorik jbachorik force-pushed the fix/tsan-linear-alloc-va-reuse branch from 402ea86 to 9062753 Compare June 12, 2026 08:30
@jbachorik jbachorik force-pushed the fix/tsan-linear-alloc-va-reuse branch from 9062753 to a31cc4a Compare June 12, 2026 08:55
…AD__

The clang CI sanitizer build does not define __SANITIZE_THREAD__, so every
TSan annotation guarded on it was dead code. Add a TSAN_ENABLED macro in
common.h (clang __has_feature + gcc __SANITIZE_THREAD__) and migrate all
sites. Also assert the mmap MAP_FIXED re-map, fix stale comments, and drop
the stale write_range spec.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate ThreadSanitizer (TSan) false-positive data race reports caused by virtual address reuse after munmap/mmap in LinearAllocator, and to standardize sanitizer detection across toolchains via common.h.

Changes:

  • Introduces toolchain-agnostic sanitizer detection macros (ASAN_ENABLED / TSAN_ENABLED) in common.h and updates call sites to use them.
  • Updates LinearAllocator’s TSan-specific handling for freshly safeAlloc’d chunks to prevent stale-shadow false positives on reused virtual addresses.
  • Adjusts test and gtest crash-handler conditional compilation to use TSAN_ENABLED.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ddprof-lib/src/main/cpp/common.h Adds uniform sanitizer detection macros for clang/gcc.
ddprof-lib/src/main/cpp/linearAllocator.cpp Switches TSan gating to TSAN_ENABLED and changes the TSan VA-reuse mitigation in allocateChunk().
ddprof-lib/src/main/cpp/gtest_crash_handler.h Uses TSAN_ENABLED to disable custom handlers under TSan.
ddprof-lib/src/test/cpp/ddprof_ut.cpp Uses TSAN_ENABLED to skip the fork-based test under TSan.
ddprof-lib/src/test/cpp/stress_callTraceStorage.cpp Includes common.h and uses TSAN_ENABLED to gate a TSan-only regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/linearAllocator.cpp
Comment thread ddprof-lib/src/main/cpp/linearAllocator.cpp
Comment thread ddprof-lib/src/main/cpp/linearAllocator.cpp Outdated
Replace assert() with an unconditional abort on mmap(MAP_FIXED) failure
so the failure is loud even under NDEBUG, and reword the comment so it no
longer implies TSAN_ENABLED is gtest-scoped (it tracks compiler sanitizer
detection; the test-only property comes from the build).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik changed the title fix(tsan): use __tsan_reset_range for mmap VA reuse in LinearAllocator fix(tsan): re-map chunk VA via mmap(MAP_FIXED) to clear stale shadow in LinearAllocator Jun 12, 2026
@jbachorik jbachorik changed the title fix(tsan): re-map chunk VA via mmap(MAP_FIXED) to clear stale shadow in LinearAllocator fix(tsan): suppress false-positive race from VA reuse in LinearAllocator Jun 12, 2026
@jbachorik jbachorik merged commit 5b5df49 into main Jun 12, 2026
103 checks passed
@jbachorik jbachorik deleted the fix/tsan-linear-alloc-va-reuse branch June 12, 2026 11:02
@github-actions github-actions Bot added this to the 1.45.0 milestone Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants