Skip to content

Conversation

@wantsui
Copy link
Collaborator

@wantsui wantsui commented Nov 7, 2025

What does this PR do?

The goal of AIDM-253 is to add process tags to the trace payloads.

I'm creating a draft for now but I still need to research some things like:
[x] add memoization or some kind of cache to avoid looking for these values over and over when are fixed by process
[] figure out the best place to add more tests to assert on the overall behavior (right now I have a test in process_spec to assert on the values)
[x] research if Windows is part of this testing strategy -> Not supported
[x] determine how to approach server type since it's not at the process level -> Will remove in this PR and add later
[x] normalization of tags

After this gets merged, the next step is to add it for the other products.

To run the tests in docker

docker compose run --rm tracer-3.3 /bin/bash
bundle exec rake compile
bundle exec rake test:core_with_rails

Main tests:

BUNDLE_GEMFILE=/app/gemfiles/ruby_3.3_rails8.gemfile bundle exec rspec spec/datadog/core/environment/process_spec.rb
bundle exec rspec spec/datadog/tracing/transport/trace_formatter_spec.rb
bundle exec rspec spec/datadog/core/normalizer_spec.rb
bundle exec rspec spec/datadog/core/configuration/settings_spec.rb

Motivation:

Change log entry

Add process tags to trace payloads.

Additional Notes:

How to test the change?

… This is still missing memoization and additional tests.
@github-actions github-actions bot added core Involves Datadog core libraries tracing labels Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Hey @DataDog/ruby-guild, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2025-11-13 22:28:19 UTC

@datadog-official
Copy link

datadog-official bot commented Nov 7, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 4073ab5 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@wantsui wantsui added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 7, 2025
def tag_process_tags!
return unless trace.experimental_propagate_process_tags_enabled
process_tags = Core::Environment::Process.formatted_process_tags_k1_v1
return if process_tags.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impossible right? If so, we can remove it, as it would give us a false sense of uncertainty here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed it in 8dae705 by just removing the check in process tags, but let me know if you spot issues with it!

@pr-commenter
Copy link

pr-commenter bot commented Nov 10, 2025

Benchmarks

Benchmark execution time: 2025-11-13 22:20:01

Comparing candidate commit 4073ab5 in PR branch add-process-tags-to-tracing with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

format!
expect(first_span.meta).to include('_dd.tags.process')
expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized)
# TODO figure out if we need an assertion for the value, ie
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcotc - do you think there's value in asserting for the values of the tag? Or is the test in process_spec enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are doing with expect(first_span.meta['_dd.tags.process']).to eq(Datadog::Core::Environment::Process.serialized) seems good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't test realistic values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option, which you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to test here is that it's respecting the configuring option.

@github-actions
Copy link

github-actions bot commented Nov 11, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 5 untyped methods and 2 partially typed methods. It decreases the percentage of typed methods from 54.67% to 54.44% (-0.23%).

Untyped methods (+5-0)Introduced:
sig/datadog/core/environment/process.rbs:7
└── def self?.entrypoint_workdir: () -> untyped
sig/datadog/core/environment/process.rbs:9
└── def self?.entrypoint_type: () -> untyped
sig/datadog/core/environment/process.rbs:11
└── def self?.entrypoint_name: () -> untyped
sig/datadog/core/environment/process.rbs:13
└── def self?.entrypoint_basedir: () -> untyped
sig/datadog/core/environment/process.rbs:15
└── def self?.serialized: () -> untyped
Partially typed methods (+2-0)Introduced:
sig/datadog/core/environment/process.rbs:14
└── def self?.serialized_kv_helper: (untyped key, untyped value) -> ::String
sig/datadog/core/normalizer.rbs:5
└── def self.normalize: (untyped original_value) -> ("" | untyped)

Untyped other declarations

This PR introduces 1 untyped other declaration. It increases the percentage of typed other declarations from 68.16% to 68.27% (+0.11%).

Untyped other declarations (+1-0)Introduced:
sig/datadog/core/environment/process.rbs:5
└── @serialized: untyped

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@wantsui wantsui marked this pull request as ready for review November 12, 2025 15:31
@wantsui wantsui requested review from a team as code owners November 12, 2025 15:31
@wantsui wantsui requested a review from mabdinur November 12, 2025 15:31
LANG_INTERPRETER = "#{RUBY_ENGINE}-#{RUBY_PLATFORM}"
LANG_PLATFORM = RUBY_PLATFORM
LANG_VERSION = RUBY_VERSION
PROCESS_TYPE = 'script'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We both know why this is script, but can let's add a comment here about what script means, so we remember in the future why Ruby is always a script process type today.

# Datadog::Tracing::Metadata::Ext::HTTP::Headers.to_tag method with some additional items
# TODO: Swap out the logic in the Datadog Tracing Metadata headers logic
def self.normalize(original_value)
return "" if original_value.nil? || original_value.to_s.strip.empty?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.to_s.strip is called here and again in the following line.

Both to_s and strip will create new string objects when called. Let's do that operation only once.

(We can't call strip! because the return of to_s does not guarantee that we receive a copy of a string: it could be immutable or even a reused internal string)

# Invalid characters are replaced with an underscore
normalized_value.gsub!(INVALID_TAG_CHARACTERS, '_')
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge squeeze with the gsub above by changing the regex in the gsub to match one or more characters, instead of just one (regex +).
This saves us a string operation and string copy.

# Remove leading non-letter characters
normalized_value.sub!(/\A[^a-z]+/, "")
# Maximum length is 200 characters
normalized_value = normalized_value[0...200] if normalized_value.length > 200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range 0...200 will be created on every method invocation, but we reuse the same range every time. Let's move it to a constant instead.

Dir.mktmpdir do |tmp_dir|
Dir.chdir(tmp_dir) do
Bundler.with_unbundled_env do
skip('rails gem could not be installed') unless system('gem install rails')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't install gems during test runs for two reasons:

  • It makes test runs non-deterministic: we can't guarantee what versions will be installed (even if we provide a fixed rails version, the transitive dependencies can change [technically, we can provide a complete Gemfile.lock to make it deterministic, but we'd just be emulating our CI facilites). It can also flake due to network issues while fetching gems from Rubygems.
  • It's slow. This one test can take more than a minute in CI just to install all the gems needed.

Instead, we have to run this test only when the gems we need are already installed. We install Rails in a few of our gem files, so any of them is fine.
But that means that this test cannot run as a simple test as part of spec:main tests (since that won't have Rails installed).
Instead, we need to run this in a Rake Task that runs in a bundler context with Rails installed.
I recommend adding a new Rake Task, called something like 'core with rails', which will need an entry to match in https://github.com/DataDog/dd-trace-rb/blob/master/Matrixfile, and add any rails combination there (just one is enough).
This way, the gemset with rails will be cached and consistent.

end
end

# Enable experimental process tags propagation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some very high level language about what this is used for.

module Datadog
module Core
module Environment
# Retrieves process level information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some very high level language regarding when the information in this module is useful.

module Process
module_function

def entrypoint_workdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods in this module should have documentation, even if simple.

def self.normalize(original_value)
return "" if original_value.nil? || original_value.to_s.strip.empty?

# Removes whitespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this comment, and the one for downcase, since they only capture what the code already does (and the code is not ambiguous).

#
# @default `DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED` environment variable, otherwise `false`
# @return [Boolean]
option :experimental_propagate_process_tags_enabled do |o|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add unit tests to the respective file for this new option.

Comment on lines 23 to 24
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Merge consecutive underscores with a single underscore
normalized_value.squeeze!('_')

Actually I don't believe we're supposed to do the squeeze (reduce multiple _ characters to a single character). Or at least the spec doesn't say to do so. Here the Python tracer leaves repeated underscores:

https://github.com/DataDog/dd-trace-py/pull/15146/files#diff-734f80f7c77b609471c1ca40c131a2604c2f50647ad9cf264863e2016f07b209R23

We should remove this to be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the Trace Agent, which has a test case seen here: https://github.com/DataDog/datadog-agent/blob/45799c842bbd216bcda208737f9f11cade6fdd95/pkg/trace/traceutil/normalize_test.go#L33

{in: "contiguous_____underscores", out: "contiguous_underscores"},

It looks like the Trace Agent merges them. I'll start a separate thread since I think we want this all to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants