[PROF-14883] Publish thread context attribute keys at process start#576
[PROF-14883] Publish thread context attribute keys at process start#576ivoanjo wants to merge 7 commits into
Conversation
**What does this PR do?** This PR unifies the `registerAttributeKeys` mechanism used for supporting the "OTel thread context" with `setProcessContext` so that they get both published together at process start. **Motivation:** Beyond the code simplification (most of this PR is deletes), the big advantage of this approach is that `registerAttributeKeys` happened at profiler start, and now this is all moved to process start. This was meh because profiler start is by default delayed (afaik up to 70 seconds in practice) which means that the "thread context" information would be missing for the same period of time, which was super confusing, and would mean an outside reader would be missing this data for that period. **Additional Notes:** This PR will pair with one on the dd-trace-java side to provide the needed info when setting the process context. **How to test the change?** This change includes test coverage + on the dd-trace-java coverage we'll add a few tests too.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c28a04d41
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
CI Test ResultsRun: #27134891685 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-08 12:05:17 UTC |
| @@ -1355,9 +1354,6 @@ Error Profiler::start(Arguments &args, bool reset) { | |||
| JfrMetadata::reset(); | |||
| JfrMetadata::initialize(args._context_attributes); | |||
| _num_context_attributes = args._context_attributes.size(); | |||
There was a problem hiding this comment.
🟡 MEDIUM · completeness [CONSENSUS]
Removing ContextApi::registerAttributeKeys(args._context_attributes) from this location means starting the profiler with attributes=... no longer publishes the OTEP attribute_key_map to the process context. The auto-publish path (added in commit 5c28a04d4) is gone; only setProcessContext(..., attributeKeys) now publishes the key map. External consumers that discover the OTEL mapping but find no thread_ctx_config cannot map key_index bytes back to attribute names.
Suggestion: Confirm the consolidation is intentional and ensure the production caller always invokes setProcessContext with attributeKeys matching args._context_attributes (same order). If the start-only path must remain decodable without an explicit setProcessContext call, retain a publish step here.
Also identified by the specialist reviewer (lens: otel-context-registration-missing).
There was a problem hiding this comment.
This is by design -- DataDog/dd-trace-java#11558 has the matching change and the assumption is that they are hand-in-hand
| @@ -93,91 +92,24 @@ public void testNativeReadBackFunctionality() { | |||
| String tracerVersion = "3.5.0"; | |||
There was a problem hiding this comment.
🟡 MEDIUM · completeness [SPECIALIST]
testStartAttributesAutoRegistersKeys — the only test asserting that start,attributes=... auto-publishes the OTEP attribute_key_map without a separate setProcessContext call — is deleted with no replacement. The behaviour change in profiler.cpp is now untested; a regression would go undetected by the surviving test suite.
Suggestion: Add a test asserting the intended new behaviour of the attributes=... start path — either explicitly verifying the key map is absent (if the two-step contract is now required), or providing an equivalent end-to-end test that covers the same scenario.
Matched lens: Tests missing or dropped for new code.
There was a problem hiding this comment.
I don't agree with the AI -- it's suggesting a negative test (Y doesn't happen when X) and I don't see the value of such a negative tests. (E.g. I also don't assert that "free nachos every friday" doesn't happen when X happens -- negative tests are weird)
There was a problem hiding this comment.
This is related to #576 (comment) - if the context attributes are no longer autpublished and one needs to call setProcessContext to actually publish them, then the test, as described here, is not needed.
But - since setProcessContext is now setting the OTEL process context as well as making sure the custom attribute names are actually published, I wonder if it should not be renamed to something better reflecting the span of operations it does.
There was a problem hiding this comment.
Hmmm, that's a good point. Perhaps setProcessAndThreadContext? Any suggestions?
| * values are recorded in the OTEP thread-local record (e.g. | ||
| * "http.route", "db.system"). Published in the process context's | ||
| * thread_ctx_config as the attribute_key_map, preceded by the | ||
| * reserved datadog.local_root_span_id slot. Must not be null |
There was a problem hiding this comment.
🔵 LOW · consistency [CONSENSUS]
The Javadoc states attributeKeys "Must not be null" but neither the Java wrapper nor the JNI layer enforces this contract. The JNI code silently treats null as an empty array (publishes only the reserved datadog.local_root_span_id slot with no warning or exception).
Suggestion: Either relax the Javadoc to document that null is treated as an empty key list (i.e. it is permitted), or add Objects.requireNonNull(attributeKeys, "attributeKeys") before the native call to enforce the stated contract.
Also identified by the specialist reviewer.
| key_ptrs[0] = "datadog.local_root_span_id"; | ||
| for (int i = 0; i < n; i++) { | ||
| jstring jstr = (jstring)env->GetObjectArrayElement(attribute_keys, i); | ||
| if (jstr == nullptr) { |
There was a problem hiding this comment.
🔵 LOW · robustness [GENERALIST]
A single null element in attributeKeys aborts the entire setProcessContext publish — the deployment environment, service name, runtime ID, and all other process-context fields are dropped, not just the bad key. Only a Log::warn is emitted. The Javadoc covers the case where the array itself is null but does not document that a null element aborts the whole publish.
Suggestion: Either skip null elements rather than aborting the entire publish (e.g. treat them as a no-op slot), or document the abort-on-null-element contract explicitly in OTelContext.setProcessContext and add a test asserting the behaviour.
There was a problem hiding this comment.
In 7cafcf5 the "abort-on-null" was documented + with a test for it
| if (jstr == nullptr) { | ||
| // A null key would corrupt the index mapping; abort the publish. | ||
| for (int j = 0; j < built; j++) delete jni_keys[j]; | ||
| Log::warn("setProcessContext: null attribute key at index %d; skipping publish", i); |
There was a problem hiding this comment.
🔵 LOW · consistency [SPECIALIST]
Two different logging APIs are used for WARN-level messages within the same function: LOG_WARN macro (line 459, capacity clip) vs Log::warn direct call (here, null key). The inconsistency suggests a copy-paste from two different source conventions.
Suggestion: Use one convention throughout setProcessCtx0 — check what the rest of javaApi.cpp uses and align to it.
| Log::warn("setProcessContext: null attribute key at index %d; skipping publish", i); | ||
| return; | ||
| } | ||
| jni_keys[built] = new JniString(env, jstr); |
There was a problem hiding this comment.
🔵 LOW · robustness [SPECIALIST]
If GetStringUTFChars fails on OOM inside the JniString constructor, c_str() returns NULL. That NULL is then written into key_ptrs[i+1] mid-array. The consumer loop in otel_process_ctx_publish iterates attribute_key_map until it finds a NULL sentinel and stops early, silently truncating the map and corrupting the index → key mapping for all keys after the failed slot. This is consistent with the existing codebase convention, but the array-mapping semantics make the impact more significant here than in the older single-string JNI sites.
Suggestion: After new JniString(env, jstr), check jni_keys[built]->c_str() != nullptr; on failure, clean up and abort the publish (mirroring the null-element path at lines 469–473).
| String tracerVersion = "3.5.0"; | ||
|
|
||
| OTelContext.getInstance().setProcessContext(env, hostname, runtimeId, service, version, tracerVersion); | ||
| OTelContext.getInstance().setProcessContext(env, hostname, runtimeId, service, version, tracerVersion, new String[0]); |
There was a problem hiding this comment.
🔵 LOW · completeness [SPECIALIST]
testProcessContextMappingCreation calls setProcessContext with new String[0] (empty keys) but only asserts an OTEL mapping exists. It never reads back attributeKeyMap to verify it contains exactly ["datadog.local_root_span_id"] — the expected published map when no user keys are supplied.
Suggestion: Add a readProcessContext() call and assert assertArrayEquals(new String[]{"datadog.local_root_span_id"}, readContext.attributeKeyMap) to pin the empty-array behaviour.
|
|
||
| OTelContext context = OTelContext.getInstance(); | ||
| context.setProcessContext(env, hostname, runtimeId, service, version, tracerVersion); | ||
| context.setProcessContext(env, hostname, runtimeId, service, version, tracerVersion, |
There was a problem hiding this comment.
🔵 LOW · completeness [GENERALIST]
No test exercises the capacity-clip path (count > DD_TAGS_CAPACITY = 10). The LOG_WARN branch and truncation logic are covered with at most 2 keys in the current suite, so an off-by-one at the clip boundary would go undetected.
Suggestion: Add a test that passes ≥ 11 keys to setProcessContext and asserts the published attributeKeyMap is {"datadog.local_root_span_id", key0, …, key9} (exactly DD_TAGS_CAPACITY user keys, DD_TAGS_CAPACITY + 1 total).
|
What does this PR do?
This PR unifies the
registerAttributeKeysmechanism used for supporting the "OTel thread context" withsetProcessContextso that they get both published together at process start.Motivation:
Beyond the code simplification (most of this PR is deletes), the big advantage of this approach is that
registerAttributeKeyshappened at profiler start, and now this is all moved to process start.This was meh because profiler start is by default delayed (afaik up to 70 seconds in practice) which means that the "thread context" information would be missing for the same period of time, which was super confusing, and would mean an outside reader would be missing this data for that period.
Additional Notes:
This PR will pair with one on the dd-trace-java side to provide the needed info when setting the process context.
How to test the change?
This change includes test coverage + on the dd-trace-java coverage we'll add a few tests too.
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.