Skip to content

Conversation

@mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Nov 4, 2025

What does this PR do?

Adds OpenTelemetry metrics SDK integration with OTLP exporter support for gRPC and HTTP/protobuf protocols.

Motivation:
Enables customers to use the OpenTelemetry Metrics API and export metrics via OTLP to any compatible backend, including the Datadog Agent.

Change log entry
Adds OpenTelemetry metrics SDK integration with OTLP exporter support. Configure via DD_METRICS_OTEL_ENABLED and standard OpenTelemetry environment variables.

Key Features:

  • OTLP exporter support (gRPC and HTTP/protobuf)
  • Automatic endpoint resolution based on protocol and agent hostname
  • Datadog service metadata mapped to OpenTelemetry resource attributes
  • General OTLP settings (OTEL_EXPORTER_OTLP_*) as defaults for metrics-specific settings

Required Gems for metrics support:

  • opentelemetry-metrics-sdk (~> 0.11)
  • opentelemetry-exporter-otlp-metrics (~> 0.6)

Testing:

BUNDLE_GEMFILE=gemfiles/ruby_3.2_opentelemetry.gemfile bundle exec rspec spec/datadog/opentelemetry/metrics_spec.rb

@github-actions github-actions bot added core Involves Datadog core libraries otel OpenTelemetry-related changes labels Nov 4, 2025
@pr-commenter
Copy link

pr-commenter bot commented Nov 4, 2025

Benchmarks

Benchmark execution time: 2025-11-16 02:13:41

Comparing candidate commit fc91825 in PR branch munir/add-otel-metrics-configs with baseline commit 49cee89 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

@mabdinur mabdinur changed the title chore(otel): add support for otel metrics [Part 1] chore(otel): add support for otel metrics Nov 4, 2025
@mabdinur mabdinur force-pushed the munir/add-otel-metrics-configs branch from a611845 to 9eb0a24 Compare November 5, 2025 00:29
Comment on lines 170 to 175
context_eval(&definition.default)
elsif definition.default_proc
context_eval(&definition.default_proc)
else
definition.default_proc || Core::Utils::SafeDup.frozen_or_dup(definition.default)
Core::Utils::SafeDup.frozen_or_dup(definition.default)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask for this to be extracted to its own PR? The current settings system is already extremely complex so I'd like to know more about why this is needed and make sure there's test coverage for the change.

Copy link
Contributor Author

@mabdinur mabdinur Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, it's not needed. I misunderstood the purpose of default_proc. Its value is evaluated lazily and this change would defeat this purpose.

I can update the Opentelemetry configurations to use Options.default instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! If you can avoid adding more stuff to the settings system, that would be amazing because it's already quite big and complex and scary as it is :(

| ------------- | ---------------------------------------------------- | --------------- | ------------------- |
| OpenTelemetry | https://github.com/open-telemetry/opentelemetry-ruby | 1.9.0+ | >= 1.1.0 |
| Tracing | https://github.com/open-telemetry/opentelemetry-ruby | 1.9.0+ | >= 1.1.0 |
| Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | Metrics SDK >= 0.8, Exporter >= 0.4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | Metrics SDK >= 0.8, Exporter >= 0.4 |
| Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | [Metrics SDK](https://rubygems.org/gems/opentelemetry-metrics-sdk) >= 0.8, [OTLP Metrics Exporter](https://rubygems.org/gems/opentelemetry-exporter-otlp-metrics) >= 0.4 |

I think we should make it clear which libraries with a panel, specially for the exporter, since there are multiple OTel gems with exporter in the name: https://rubygems.org/search?query=opentelemetry-exporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it pretty confusing. I added these links

...
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Datadog.configure or OpenTelemetry::SDK.configure have to be called before OpenTelemetry metrics start being emitted, or does it just work if I follow these instructions?

    ENV['DD_METRICS_OTEL_ENABLED'] = 'true'
    require 'opentelemetry/sdk'
    require 'opentelemetry-metrics-sdk'
    require 'opentelemetry/exporter/otlp_metrics'
    require 'datadog/opentelemetry'

    require 'opentelemetry/metrics'
    provider = OpenTelemetry.meter_provider
    meter = provider.meter('my-app')
    counter = meter.create_counter('requests')
    counter.add(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenTelemetry::SDK.configure is required here. Otherwise we will get the NoopMeterProvider. Good catch. I will fix these docs

...
```

1. Use the OpenTelemetry Metrics API:
Copy link
Member

@marcotc marcotc Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Use the OpenTelemetry Metrics API:
1. Use the OpenTelemetry Metrics API: <!-- Link to the official docs once Metrics is stable in Ruby: https://opentelemetry.io/docs/languages/ruby/instrumentation/#metrics -->

I thought it was unnecessary for us to directly provide an example here, for API usage, thinking that there should be plenty of good examples in the OpenTelemetry documentation. But it looks to me that, because Metrics is not yet stable, there are virtually no docs.

We should change this once it becomes stable.
I recommend adding a comment here, just so we understand why we chose to "copy" OTel documentation here.

def self.resolve_agent_hostname
require_relative '../../core/configuration/agent_settings_resolver'
agent_settings = Datadog::Core::Configuration::AgentSettingsResolver.call(Datadog.configuration)
agent_settings.hostname || Datadog::Core::Configuration::Ext::Agent::HTTP::DEFAULT_HOST
Copy link
Member

@marcotc marcotc Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabdinur do you know why this agent_settings.hostname || Datadog::Core::Configuration::Ext::Agent::HTTP::DEFAULT_HOST is required?

At at high-level, it looks like some abstraction from inside AgentSettingsResolver is leaking out, requiring you to manually override the hostname resolved from Datadog.configuration.

Copy link
Contributor Author

@mabdinur mabdinur Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe agent_settings.hostname is nil by default. In that scenario we default to DEFAULT_HOST. Is this something we can change? It would be nice if agent_settings.hostname always referenced a valid hostname

Comment on lines 53 to 65
settings.tags&.each do |key, value|
otel_key = case key
when 'service' then 'service.name'
when 'env' then 'deployment.environment'
when 'version' then 'service.version'
else key
end
resource_attributes[otel_key] = value
end

resource_attributes['service.name'] = settings.service if settings.service
resource_attributes['deployment.environment'] = settings.env if settings.env
resource_attributes['service.version'] = settings.version if settings.version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.tags will override settings.service ,.env, and .version when applicable:

# Cross-populate tag values with other settings
if env_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV)
self.env = string_tags[Core::Environment::Ext::TAG_ENV]
end

This means that when 'service' then 'service.name' will always be overwritten by resource_attributes['service.name'] = settings.service if settings.service.

I see that we still want to process other keys (else key), but maybe a better way to write this could be:

          resource_attributes.merge(settings.tags.reject do |key, value|
            key == 'service' || key == 'env' || key == 'version'
          end) if settings.tags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then we need to rewrite this. The USTs reported by otlp logs should be the same as tracing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python we keep all the values in settings.tags and and env/version/service in sync

Comment on lines 51 to 53
resource_attributes = {'host.name' => Datadog::Core::Environment::Socket.hostname}

settings.tags&.each do |key, value|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Datadog::Core::Environment::Socket.hostname override the user-provided tag with key host.name? Today the user-provided tag overrides Datadog::Core::Environment::Socket.hostname. I just want to make sure that this is intentional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup user provided tags take precedence. Also I need to update this logic we should only set. host.name if reporthost name is true. Ideally we should let the agent resolve the hostname

Comment on lines 100 to 106
metrics_protocol = DATADOG_ENV['OTEL_EXPORTER_OTLP_METRICS_PROTOCOL']
next nil unless metrics_protocol

host = Datadog::OpenTelemetry::Configuration::Settings.resolve_agent_hostname
port = metrics_protocol == 'http/protobuf' ? 4318 : 4317
path = metrics_protocol == 'http/protobuf' ? '/v1/metrics' : ''
"http://#{host}:#{port}#{path}"
Copy link
Member

@marcotc marcotc Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot! This is a mini AgentSettingsResolver, but for OTel metrics 😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, do you think it's possible to move the endpoint resolution logic to Metrics::Initializer.resolve_metrics_endpoint, leaving the settings in this file mostly just reflecting the real configuration provide by the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I did this

@mabdinur mabdinur marked this pull request as ready for review November 14, 2025 18:29
@mabdinur mabdinur requested review from a team as code owners November 14, 2025 18:29
@mabdinur mabdinur requested review from marcotc and vpellan November 14, 2025 18:29
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Nov 16, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: fc91825 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries otel OpenTelemetry-related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants