feat: add process tags to traces #5033

wantsui · 2025-11-07T22:46:53Z

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

I'm creating a draft for now but I still need to research some things like:
[x] add memoization or some kind of cache to avoid looking for these values over and over when are fixed by process
[] figure out the best place to add more tests to assert on the overall behavior (right now I have a test in process_spec to assert on the values)
[x] research if Windows is part of this testing strategy -> Not supported
[x] determine how to approach server type since it's not at the process level -> Will remove in this PR and add later
[x] normalization of tags

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash
bundle exec rake compile
bundle exec rake test:core_with_rails

Main tests:

BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb
bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

Change log entry

Add process tags to trace payloads.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.

github-actions · 2025-11-07T22:47:06Z

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

^{Visited at: 2025-11-13 22:28:19 UTC}

datadog-official · 2025-11-07T22:51:11Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 4073ab5 | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

lib/datadog/core/environment/process.rb

lib/datadog/tracing/configuration/settings.rb

marcotc · 2025-11-10T21:12:26Z

lib/datadog/tracing/transport/trace_formatter.rb

+        def tag_process_tags!
+          return unless trace.experimental_propagate_process_tags_enabled
+          process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
+          return if process_tags.empty?


This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

…he payload has the process tag only when the feature is enabled.

…versions so this fixes that.

Co-authored-by: Marco Costa <[email protected]>

pr-commenter · 2025-11-10T22:17:05Z

Benchmarks

Benchmark execution time: 2025-11-13 22:20:01

Comparing candidate commit 4073ab5 in PR branch add-process-tags-to-tracing with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

wantsui · 2025-11-11T16:29:31Z

spec/datadog/tracing/transport/trace_formatter_spec.rb

+          format!
+          expect(first_span.meta).to include('_dd.tags.process')
+          expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
+          # TODO figure out if we need an assertion for the value, ie


@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

I wouldn't test realistic values.

The main thing to test here is that it's respecting the configuring option, which you did.

The main thing to test here is that it's respecting the configuring option.

github-actions · 2025-11-11T18:43:19Z

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 5 untyped methods and 2 partially typed methods. It decreases the percentage of typed methods from 54.67% to 54.44% (-0.23%).

Untyped methods (+5-0)

❌ Introduced:

sig/datadog/core/environment/process.rbs:7
└── def self?.entrypoint_workdir: () -> untyped
sig/datadog/core/environment/process.rbs:9
└── def self?.entrypoint_type: () -> untyped
sig/datadog/core/environment/process.rbs:11
└── def self?.entrypoint_name: () -> untyped
sig/datadog/core/environment/process.rbs:13
└── def self?.entrypoint_basedir: () -> untyped
sig/datadog/core/environment/process.rbs:15
└── def self?.serialized: () -> untyped

Partially typed methods (+2-0)

❌ Introduced:

sig/datadog/core/environment/process.rbs:14
└── def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String
sig/datadog/core/normalizer.rbs:5
└── def self.normalize: (untyped original_value) -> ("" | untyped)

Untyped other declarations

This PR introduces 1 untyped other declaration. It increases the percentage of typed other declarations from 68.16% to 68.27% (+0.11%).

Untyped other declarations (+1-0)

❌ Introduced:

sig/datadog/core/environment/process.rbs:5
└── @serialized: untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

…uby conflict with sqlite and it is not needed for this test

marcotc · 2025-11-12T19:00:13Z

lib/datadog/core/environment/ext.rb

        LANG_INTERPRETER = "#{RUBY_ENGINE}-#{RUBY_PLATFORM}"
        LANG_PLATFORM = RUBY_PLATFORM
        LANG_VERSION = RUBY_VERSION
+        PROCESS_TYPE = 'script'


We both know why this is script, but can let's add a comment here about what script means, so we remember in the future why Ruby is always a script process type today.

marcotc · 2025-11-12T19:22:09Z

lib/datadog/core/normalizer.rb

+      # Datadog::Tracing::Metadata::Ext::HTTP::Headers.to_tag method with some additional items
+      # TODO: Swap out the logic in the Datadog Tracing Metadata headers logic
+      def self.normalize(original_value)
+        return "" if original_value.nil? || original_value.to_s.strip.empty?


.to_s.strip is called here and again in the following line.

Both to_s and strip will create new string objects when called. Let's do that operation only once.

(We can't call strip! because the return of to_s does not guarantee that we receive a copy of a string: it could be immutable or even a reused internal string)

marcotc · 2025-11-12T19:31:16Z

lib/datadog/core/normalizer.rb

+        # Invalid characters are replaced with an underscore
+        normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_')
+        # Merge consecutive underscores with a single underscore
+        normalized_value.squeeze!('_')


Let's merge squeeze with the gsub above by changing the regex in the gsub to match one or more characters, instead of just one (regex +).
This saves us a string operation and string copy.

marcotc · 2025-11-12T19:52:57Z

lib/datadog/core/normalizer.rb

+        # Remove leading non-letter characters
+        normalized_value.sub!(/\A[^a-z]+/, "")
+        # Maximum length is 200 characters
+        normalized_value = normalized_value[0...200] if normalized_value.length > 200


The range 0...200 will be created on every method invocation, but we reuse the same range every time. Let's move it to a constant instead.

marcotc · 2025-11-12T20:06:23Z

spec/datadog/core/environment/process_spec.rb

+        Dir.mktmpdir do |tmp_dir|
+          Dir.chdir(tmp_dir) do
+            Bundler.with_unbundled_env do
+              skip('rails gem could not be installed') unless system('gem install rails')


We shouldn't install gems during test runs for two reasons:

It makes test runs non-deterministic: we can't guarantee what versions will be installed (even if we provide a fixed rails version, the transitive dependencies can change [technically, we can provide a complete Gemfile.lock to make it deterministic, but we'd just be emulating our CI facilites). It can also flake due to network issues while fetching gems from Rubygems.

It's slow. This one test can take more than a minute in CI just to install all the gems needed.

Instead, we have to run this test only when the gems we need are already installed. We install Rails in a few of our gem files, so any of them is fine.
But that means that this test cannot run as a simple test as part of spec:main tests (since that won't have Rails installed).
Instead, we need to run this in a Rake Task that runs in a bundler context with Rails installed.
I recommend adding a new Rake Task, called something like 'core with rails', which will need an entry to match in https://github.com/DataDog/dd-trace-rb/blob/master/Matrixfile, and add any rails combination there (just one is enough).
This way, the gemset with rails will be cached and consistent.

marcotc · 2025-11-12T20:32:02Z

lib/datadog/core/configuration/settings.rb

          end
        end

+        # Enable experimental process tags propagation.


Let's add some very high level language about what this is used for.

marcotc · 2025-11-12T20:32:29Z

lib/datadog/core/environment/process.rb

+module Datadog
+  module Core
+    module Environment
+      # Retrieves process level information


Add some very high level language regarding when the information in this module is useful.

marcotc · 2025-11-12T20:33:06Z

lib/datadog/core/environment/process.rb

+      module Process
+        module_function
+
+        def entrypoint_workdir


The methods in this module should have documentation, even if simple.

marcotc · 2025-11-12T20:34:26Z

lib/datadog/core/normalizer.rb

+      def self.normalize(original_value)
+        return "" if original_value.nil? || original_value.to_s.strip.empty?
+
+        # Removes whitespaces


You can remove this comment, and the one for downcase, since they only capture what the code already does (and the code is not ambiguous).

marcotc · 2025-11-12T20:35:25Z

lib/datadog/core/configuration/settings.rb

+        #
+        # @default `DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED` environment variable, otherwise `false`
+        # @return [Boolean]
+        option :experimental_propagate_process_tags_enabled do |o|


We need to add unit tests to the respective file for this new option.

lib/datadog/core/normalizer.rb

tlhunter · 2025-11-12T22:51:43Z

lib/datadog/core/normalizer.rb

+        # Merge consecutive underscores with a single underscore
+        normalized_value.squeeze!('_')


Suggested change

# Merge consecutive underscores with a single underscore

normalized_value.squeeze!('_')

Actually I don't believe we're supposed to do the squeeze (reduce multiple _ characters to a single character). Or at least the spec doesn't say to do so. Here the Python tracer leaves repeated underscores:

https://github.com/DataDog/dd-trace-py/pull/15146/files#diff-734f80f7c77b609471c1ca40c131a2604c2f50647ad9cf264863e2016f07b209R23

We should remove this to be consistent.

I was following the Trace Agent, which has a test case seen here: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L33

{in: "contiguous_____underscores", out: "contiguous_underscores"},

It looks like the Trace Agent merges them. I'll start a separate thread since I think we want this all to be consistent.

… environment variable.

Add initial attempt at adding process related tags on trace payloads.…

1d8bab2

… This is still missing memoization and additional tests.

github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025

wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025

Add test for multiple calls to the formatter tags

58592a3

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/core/environment/process.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

lib/datadog/tracing/configuration/settings.rb Outdated Show resolved Hide resolved

marcotc reviewed Nov 10, 2025

View reviewed changes

wantsui and others added 6 commits November 10, 2025 16:29

Add tests for trace formatter spec to assert that the first span of t…

7dc9184

…he payload has the process tag only when the feature is enabled.

it turns out you cannot just pin things to rails 7 due to newer ruby …

cad26a6

…versions so this fixes that.

Update lib/datadog/core/environment/process.rb

f31440a

Co-authored-by: Marco Costa <[email protected]>

fix string and rename formatted_process_tags_k1_v1 to serialized

cfec602

remove unneeded line

8dae705

remove server type for now until more research is done

055586f

Add new tag normalizer logic following the trace agent.

cacb500

wantsui commented Nov 11, 2025

View reviewed changes

wantsui added 2 commits November 11, 2025 13:38

lint fix

7661a3f

add missing files from prototype command

7825940

wantsui added 3 commits November 11, 2025 13:47

Add missing constants to ext rbs file

5de6efd

jruby fix for the process spec

f5ca84a

remove the active record during rails creation because it caused a jr…

9ad5be5

…uby conflict with sqlite and it is not needed for this test

wantsui mentioned this pull request Nov 11, 2025

swap out the existing headers normalization logic with the tag normalizer #5041

Draft

wantsui requested a review from vandonr November 12, 2025 15:27

wantsui marked this pull request as ready for review November 12, 2025 15:31

wantsui requested review from a team as code owners November 12, 2025 15:31

wantsui requested a review from mabdinur November 12, 2025 15:31

marcotc reviewed Nov 12, 2025

View reviewed changes

tlhunter reviewed Nov 12, 2025

View reviewed changes

lib/datadog/core/normalizer.rb Outdated Show resolved Hide resolved

tlhunter reviewed Nov 12, 2025

View reviewed changes

wantsui and others added 3 commits November 13, 2025 16:14

Bring tag normalization to 1:1 parity with the Trace Agent

a66e635

Add changes from code review around comments and add test for the new…

ec1e930

… environment variable.

Merge branch 'master' into add-process-tags-to-tracing

4073ab5

		# Merge consecutive underscores with a single underscore
		normalized_value.squeeze!('_')

feat: add process tags to traces #5033

Are you sure you want to change the base?

feat: add process tags to traces #5033

Conversation

wantsui commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing analysis

Untyped methods

Untyped other declarations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wantsui commented Nov 7, 2025 •

edited

Loading

github-actions bot commented Nov 7, 2025 •

edited

Loading

datadog-official bot commented Nov 7, 2025 •

edited

Loading

pr-commenter bot commented Nov 10, 2025 •

edited

Loading

github-actions bot commented Nov 11, 2025 •

edited

Loading