-
Notifications
You must be signed in to change notification settings - Fork 398
chore(otel): add support for otel metrics #5021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
BenchmarksBenchmark execution time: 2025-11-16 02:13:41 Comparing candidate commit fc91825 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics. |
a611845 to
9eb0a24
Compare
| context_eval(&definition.default) | ||
| elsif definition.default_proc | ||
| context_eval(&definition.default_proc) | ||
| else | ||
| definition.default_proc || Core::Utils::SafeDup.frozen_or_dup(definition.default) | ||
| Core::Utils::SafeDup.frozen_or_dup(definition.default) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask for this to be extracted to its own PR? The current settings system is already extremely complex so I'd like to know more about why this is needed and make sure there's test coverage for the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, it's not needed. I misunderstood the purpose of default_proc. Its value is evaluated lazily and this change would defeat this purpose.
I can update the Opentelemetry configurations to use Options.default instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack! If you can avoid adding more stuff to the settings system, that would be amazing because it's already quite big and complex and scary as it is :(
docs/OpenTelemetry.md
Outdated
| | ------------- | ---------------------------------------------------- | --------------- | ------------------- | | ||
| | OpenTelemetry | https://github.com/open-telemetry/opentelemetry-ruby | 1.9.0+ | >= 1.1.0 | | ||
| | Tracing | https://github.com/open-telemetry/opentelemetry-ruby | 1.9.0+ | >= 1.1.0 | | ||
| | Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | Metrics SDK >= 0.8, Exporter >= 0.4 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | Metrics SDK >= 0.8, Exporter >= 0.4 | | |
| | Metrics | https://github.com/open-telemetry/opentelemetry-ruby | 2.24.0+ | [Metrics SDK](https://rubygems.org/gems/opentelemetry-metrics-sdk) >= 0.8, [OTLP Metrics Exporter](https://rubygems.org/gems/opentelemetry-exporter-otlp-metrics) >= 0.4 | |
I think we should make it clear which libraries with a panel, specially for the exporter, since there are multiple OTel gems with exporter in the name: https://rubygems.org/search?query=opentelemetry-exporter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it pretty confusing. I added these links
docs/OpenTelemetry.md
Outdated
| ... | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Datadog.configure or OpenTelemetry::SDK.configure have to be called before OpenTelemetry metrics start being emitted, or does it just work if I follow these instructions?
ENV['DD_METRICS_OTEL_ENABLED'] = 'true'
require 'opentelemetry/sdk'
require 'opentelemetry-metrics-sdk'
require 'opentelemetry/exporter/otlp_metrics'
require 'datadog/opentelemetry'
require 'opentelemetry/metrics'
provider = OpenTelemetry.meter_provider
meter = provider.meter('my-app')
counter = meter.create_counter('requests')
counter.add(1)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenTelemetry::SDK.configure is required here. Otherwise we will get the NoopMeterProvider. Good catch. I will fix these docs
docs/OpenTelemetry.md
Outdated
| ... | ||
| ``` | ||
|
|
||
| 1. Use the OpenTelemetry Metrics API: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. Use the OpenTelemetry Metrics API: | |
| 1. Use the OpenTelemetry Metrics API: <!-- Link to the official docs once Metrics is stable in Ruby: https://opentelemetry.io/docs/languages/ruby/instrumentation/#metrics --> |
I thought it was unnecessary for us to directly provide an example here, for API usage, thinking that there should be plenty of good examples in the OpenTelemetry documentation. But it looks to me that, because Metrics is not yet stable, there are virtually no docs.
We should change this once it becomes stable.
I recommend adding a comment here, just so we understand why we chose to "copy" OTel documentation here.
| def self.resolve_agent_hostname | ||
| require_relative '../../core/configuration/agent_settings_resolver' | ||
| agent_settings = Datadog::Core::Configuration::AgentSettingsResolver.call(Datadog.configuration) | ||
| agent_settings.hostname || Datadog::Core::Configuration::Ext::Agent::HTTP::DEFAULT_HOST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mabdinur do you know why this agent_settings.hostname || Datadog::Core::Configuration::Ext::Agent::HTTP::DEFAULT_HOST is required?
At at high-level, it looks like some abstraction from inside AgentSettingsResolver is leaking out, requiring you to manually override the hostname resolved from Datadog.configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe agent_settings.hostname is nil by default. In that scenario we default to DEFAULT_HOST. Is this something we can change? It would be nice if agent_settings.hostname always referenced a valid hostname
lib/datadog/opentelemetry/metrics.rb
Outdated
| settings.tags&.each do |key, value| | ||
| otel_key = case key | ||
| when 'service' then 'service.name' | ||
| when 'env' then 'deployment.environment' | ||
| when 'version' then 'service.version' | ||
| else key | ||
| end | ||
| resource_attributes[otel_key] = value | ||
| end | ||
|
|
||
| resource_attributes['service.name'] = settings.service if settings.service | ||
| resource_attributes['deployment.environment'] = settings.env if settings.env | ||
| resource_attributes['service.version'] = settings.version if settings.version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
settings.tags will override settings.service ,.env, and .version when applicable:
dd-trace-rb/lib/datadog/core/configuration/settings.rb
Lines 718 to 721 in 0bf5781
| # Cross-populate tag values with other settings | |
| if env_value.nil? && string_tags.key?(Core::Environment::Ext::TAG_ENV) | |
| self.env = string_tags[Core::Environment::Ext::TAG_ENV] | |
| end |
This means that when 'service' then 'service.name' will always be overwritten by resource_attributes['service.name'] = settings.service if settings.service.
I see that we still want to process other keys (else key), but maybe a better way to write this could be:
resource_attributes.merge(settings.tags.reject do |key, value|
key == 'service' || key == 'env' || key == 'version'
end) if settings.tagsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah then we need to rewrite this. The USTs reported by otlp logs should be the same as tracing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python we keep all the values in settings.tags and and env/version/service in sync
lib/datadog/opentelemetry/metrics.rb
Outdated
| resource_attributes = {'host.name' => Datadog::Core::Environment::Socket.hostname} | ||
|
|
||
| settings.tags&.each do |key, value| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Datadog::Core::Environment::Socket.hostname override the user-provided tag with key host.name? Today the user-provided tag overrides Datadog::Core::Environment::Socket.hostname. I just want to make sure that this is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup user provided tags take precedence. Also I need to update this logic we should only set. host.name if reporthost name is true. Ideally we should let the agent resolve the hostname
| metrics_protocol = DATADOG_ENV['OTEL_EXPORTER_OTLP_METRICS_PROTOCOL'] | ||
| next nil unless metrics_protocol | ||
|
|
||
| host = Datadog::OpenTelemetry::Configuration::Settings.resolve_agent_hostname | ||
| port = metrics_protocol == 'http/protobuf' ? 4318 : 4317 | ||
| path = metrics_protocol == 'http/protobuf' ? '/v1/metrics' : '' | ||
| "http://#{host}:#{port}#{path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot! This is a mini AgentSettingsResolver, but for OTel metrics 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, do you think it's possible to move the endpoint resolution logic to Metrics::Initializer.resolve_metrics_endpoint, leaving the settings in this file mostly just reflecting the real configuration provide by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I did this
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: fc91825 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
What does this PR do?
Adds OpenTelemetry metrics SDK integration with OTLP exporter support for gRPC and HTTP/protobuf protocols.
Motivation:
Enables customers to use the OpenTelemetry Metrics API and export metrics via OTLP to any compatible backend, including the Datadog Agent.
Change log entry
Adds OpenTelemetry metrics SDK integration with OTLP exporter support. Configure via
DD_METRICS_OTEL_ENABLEDand standard OpenTelemetry environment variables.Key Features:
OTEL_EXPORTER_OTLP_*) as defaults for metrics-specific settingsRequired Gems for metrics support:
opentelemetry-metrics-sdk(~> 0.11)opentelemetry-exporter-otlp-metrics(~> 0.6)Testing:
BUNDLE_GEMFILE=gemfiles/ruby_3.2_opentelemetry.gemfile bundle exec rspec spec/datadog/opentelemetry/metrics_spec.rb