Skip to content

Conversation

@Strech
Copy link
Member

@Strech Strech commented Nov 5, 2025

What does this PR do?

Adds new component under lib/datadog/open_feature/ folder that is providing customer an ability to configure Datadog feature flags provider. This provider is going to relay on Remote Configuration to deliver feature flags configurations (aka UFC Universal Flag Configuration).

Said provider is going into customer code as a part of configuration for OpenFeature

require 'open_feature/sdk'
require 'datadog/open_feature/provider'

Datadog.configure do |config|
  config.open_feature.enabled = true
end

OpenFeature::SDK.configure do |config|
  config.set_provider(Datadog::OpenFeature::Provider.new)
end

client = OpenFeature::SDK.build_client
client.fetch_string_value(flag_key: 'banner', default_value: 'default')
# => 'something-for-that-context'

Motivation:

This is a part of upcoming work across all libs.

Important

This code doesn't contain actual evaluation logic, but rather establish everything for it to be placed in the next PR.

Change log entry

Yes. OpenFeature: Add experimental OpenFeature component.

Additional Notes:

This PR was already reviewed as a part of #5022, but accidentally was marked as ready for review. This code is fresh and already applied some suggestions from the previous review, but not everything is possible due to the existing established way or the time boundaries.

Structure:

lib/datadog/open_feature
├── binding <------- A pre-made place for upcoming binding (written in Ruby)
├── exposures <----- Every time we fetch the flag value we are sending exposure event
├── provider.rb <--- A public API class we expose (see example above)
└── transport <----- Implementation that goes along with existing components

I would like to ask some pair of eyes on thread-safety and potential issues of new component. I've run the code over certain checks already, but never enough.

How to test the change?

CI and we will have new set of ST in #5022 enabled to prove that everything works.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Thank you for updating Change log entry section 👏

Visited at: 2025-11-11 09:56:41 UTC

@github-actions github-actions bot added core Involves Datadog core libraries appsec Application Security monitoring product labels Nov 5, 2025
@pr-commenter
Copy link

pr-commenter bot commented Nov 5, 2025

Benchmarks

Benchmark execution time: 2025-11-11 15:59:53

Comparing candidate commit 760f2c5 in PR branch ffl-1319-add-agent-communication-for-openfeature with baseline commit c560834 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:tracing - Tracing.log_correlation

  • 🟩 throughput [+9417.989op/s; +9696.611op/s] or [+9.509%; +9.791%]

@Strech Strech force-pushed the ffl-1319-add-agent-communication-for-openfeature branch from dbc899c to 0a25c37 Compare November 6, 2025 11:44
@Strech Strech force-pushed the ffl-1319-add-agent-communication-for-openfeature branch 3 times, most recently from b1a978f to 9e7efb0 Compare November 10, 2025 10:02
@github-actions github-actions bot added integrations Involves tracing integrations profiling Involves Datadog profiling tracing labels Nov 10, 2025
@Strech Strech changed the base branch from add-openfeature-component to master November 10, 2025 12:42
@github-actions
Copy link

github-actions bot commented Nov 10, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 2 untyped methods and 10 partially typed methods. It increases the percentage of typed methods from 53.12% to 54.68% (+1.56%).

Untyped methods (+2-0)Introduced:
sig/datadog/open_feature/provider.rbs:8
└── def initialize: () -> void
sig/datadog/open_feature/transport/http/api.rbs:8
└── def self?.defaults: () -> untyped
Partially typed methods (+10-0)Introduced:
sig/datadog/open_feature/binding/evaluator.rbs:17
└── def generate: (::Symbol expected_type) -> untyped
sig/datadog/open_feature/configuration/settings.rbs:12
└── def self.add_settings!: (untyped base) -> void
sig/datadog/open_feature/exposures/batch_builder.rbs:9
└── def payload_for: (::Array[Models::Event] events) -> ::Hash[::Symbol, untyped]
sig/datadog/open_feature/exposures/models/event.rbs:18
└── def initialize: (::Hash[::Symbol, untyped]) -> void
sig/datadog/open_feature/exposures/models/event.rbs:28
└── def to_h: () -> ::Hash[::Symbol, untyped]
sig/datadog/open_feature/exposures/models/event.rbs:32
└── def self.extract_attributes: (::OpenFeature::SDK::EvaluationContext) -> ::Hash[::String, untyped]
sig/datadog/open_feature/exposures/worker.rbs:41
└── def perform: (*untyped) -> void
sig/datadog/open_feature/exposures/worker.rbs:45
└── def send_payload: (::Hash[::Symbol, untyped]) -> Core::Transport::Response?
sig/datadog/open_feature/transport/exposures.rbs:14
└── def initialize: (OpenFeature::Transport::Exposures::EncodedParcel, ?::Hash[::Symbol, untyped]?) -> void
sig/datadog/open_feature/transport/exposures.rbs:28
└── def send_exposures: (::Hash[::Symbol, untyped], ?headers: ::Hash[::Symbol, untyped]?) -> Core::Transport::Response

Untyped other declarations

This PR introduces 1 untyped other declaration and 4 partially typed other declarations. It increases the percentage of typed other declarations from 66.92% to 67.95% (+1.03%).

Untyped other declarations (+1-0)Introduced:
sig/datadog/open_feature/binding/resolution_details.rbs:5
└── attr_accessor value: untyped
Partially typed other declarations (+4-0)Introduced:
sig/datadog/open_feature/binding/resolution_details.rbs:15
└── attr_accessor flag_metadata: ::Hash[::String, untyped]?
sig/datadog/open_feature/binding/resolution_details.rbs:21
└── attr_accessor extra_logging: ::Hash[::String, untyped]?
sig/datadog/open_feature/exposures/models/event.rbs:10
└── @payload: ::Hash[::Symbol, untyped]
sig/datadog/open_feature/transport/exposures.rbs:12
└── attr_reader headers: ::Hash[::Symbol, untyped]

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@datadog-datadog-prod-us1

This comment has been minimized.

@Strech Strech force-pushed the ffl-1319-add-agent-communication-for-openfeature branch from d7424f9 to 2221c34 Compare November 10, 2025 15:04
evaluation_context: evaluation_context
)

if result.error_code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't have a quick answer for this earlier if all results with an error code should return the default_value. I'm still not sure if this works with libdatadog but I wrote the code in #5022 so that it works with the Ruby evaluator by setting error_code: nil when there is a successful match in the UFC.

def create_success_result(value, variant, allocation_key, variation_type, do_log, reason)
ResolutionDetails.new(
value: value,
variant: variant,
error_code: nil, # nil for successful cases (so "if result.error_code" works correctly)
error_message: '', # Empty string for Ok cases (matches libdatadog FFI)
reason: reason,
allocation_key: allocation_key,
do_log: do_log,
flag_metadata: {
"allocation_key" => allocation_key,
"variation_type" => variation_type,
"do_log" => do_log
},
extra_logging: {}
)
end

I don't think this is blocking our ability to merge #5022 though because this interface is internal and we can adjust it as needed with no external impact.

I see two cases in libdatadog where there would be an error code Ok. For these two cases, this check works as intended. Since the error code is not null, we will return default_value.

            EvaluationError::FlagDisabled => ErrorCode::Ok,
            EvaluationError::DefaultAllocationNull => ErrorCode::Ok,

https://github.com/DataDog/libdatadog/blob/3d9e641016f3a6c05bfdb1786c4b1a84342d4bbf/datadog-ffe-ffi/src/assignment.rs#L417-L418

The part where I'm confused --> is the error code also "Ok" if the evaluation is successful? Specifically, I'm looking at

            Ok(_) => ErrorCode::Ok,      // Always returns ErrorCode::Ok for success?
            Err(err) => ErrorCode::from(err),

https://github.com/DataDog/libdatadog/blob/3d9e641016f3a6c05bfdb1786c4b1a84342d4bbf/datadog-ffe-ffi/src/assignment.rs#L402-L409

If the error code is never nil, then this if result.error_code check won't work with libdatadog. I'd imagine we need to check if result.value is nil or if result.flag_metadata is an empty hash or check some other attribute to differentiate a disabled flag from a successful match

Maybe this is a question for @dd-oleksii: how do we differentiate a successful evaluation (with an allocation match) from the FlagDisabled/DefaultAllocationNull cases if both have a "Ok" error code?

Copy link
Member Author

@Strech Strech Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best option is to create interface we want, the best option is not to guess on my side and have it done via

if result.error?
...

Hence the simple error? method would solve the issue and could be replicated later with C-extension


Based on the Rust code I've looked at, yes, it returns in few places Ok, when it actually must return Err. Not sure what was the intent, but it should not shallow errors and return things as it and it's our job to represent it to the customer properly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a method or an attribute that gives a simple answer to that question could be part of the binding code eventually

@Strech Strech marked this pull request as ready for review November 11, 2025 10:09
@Strech Strech requested a review from a team as a code owner November 11, 2025 10:09
@Strech Strech added the openfeature A new component that provider an ability to configure feature flags label Nov 11, 2025
:error_message,
:flag_metadata,
:allocation_key,
:do_log,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a method named log? would be more usual for Ruby

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's an interface I would get from the binding 🤷🏼
To be honest I'm a bit desperate with the binding.

But I think that makes sense to do @sameerank


@worker.enqueue(event)
rescue => e
@logger.debug { "OpenFeature: Reporter failed to enqueue exposure: #{e.class}: #{e.message}" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log message could be a bit more user-friendly. How about: "OpenFeature: Failed to report evaluation result: ..."?

end

def enqueue(event)
return false if forked?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? It would prevent event reporting for all forks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one - not sure, I think I will allow it in forks and then see if we need to restrict it. So far I did it to limit things I have to care.


def flush
events, dropped = dequeue
send_events(events || [], dropped.to_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use Array(events) instead of events || []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and could dropped be anything other than an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both could come as nil because of the call in start. I would wrap events with array 👍🏼

def send_payload(payload)
@flush_mutex.synchronize do
response = @transport.send_exposures(payload)
logger.debug { "OpenFeature: Send exposures response was not OK: #{response.inspect}" } unless response&.ok?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is visible for our customers - is "exposures" a term that we are using a lot in our public documentation and UI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, that's an internal terminology (not publicly mentioned, so no alternative). We probably could rephrase it as "unable to send evaluation details metric ..." (that's what we use in the interface)


yield(api, env)
rescue => e
message = "Internal error during #{self.class.name} request. Cause: #{e.class.name} #{e.message} " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include self.class.name here in the message, as we are already reporting the partial backtrace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a copy-paste, I will try to rework the message 👍🏼

end
end

require_relative 'binding/evaluator'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move requires to the top of the file please?

return unless settings.open_feature.enabled

unless settings.remote.enabled
logger.warn('OpenFeature: Could not be enabled without Remote Configuration Management available. To enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

"OpenFeature could not be enabled without Remote Configuration. The enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config"

# In the example from the OpenFeature there is zero trust to the result of the evaluation
# do we want to go that way?

result = @evaluator.get_assignment(flag_key, evaluation_context, expected_type, Time.now.utc.to_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call ufc on Time.now, considering that we are converting it to int anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mock call and it will change in the other PR, so, we don't need it.


result
rescue => e
@telemetry.report(e, description: 'OpenFeature: Failed to fetch value for flag')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the flag key to this message?

Copy link
Member Author

@Strech Strech Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, because it's private and we are not allowed to report it

P.S Flag name might uncover certain business-value and hence considered private

Comment on lines 11 to 14
require_relative 'exposures/buffer'
require_relative 'exposures/worker'
require_relative 'exposures/deduplicator'
require_relative 'exposures/reporter'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we place those at the beginning of the file for consistency with our code base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's different in our codebase, but will do. What I don't like - logically they will define module, when the definition is actually here. But don't mind to change

data = content.data.read
content.data.rewind

raise ReadError, 'EOF reached' if data.nil?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to raise here, or log a warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a part that I have no answer, but that's what all the components have in their remote 🤷🏼

register_receivers(Datadog::DI::Remote.receivers(@telemetry))
end

if settings.respond_to?(:open_feature) && settings.open_feature.enabled
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I have to keep this check because of some tests are impossible to rewrite

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they mocking the API? :( Thanks for trying anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove one check tho, so 50% success rate. The test is called loading_spec.rb and it is checking the base minimum configuration without all components loaded. And then load them 1-by-1

@ivoanjo ivoanjo removed the profiling Involves Datadog profiling label Nov 11, 2025
@y9v y9v removed the appsec Application Security monitoring product label Nov 11, 2025
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes!

register_receivers(Datadog::DI::Remote.receivers(@telemetry))
end

if settings.respond_to?(:open_feature) && settings.open_feature.enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they mocking the API? :( Thanks for trying anyway.

Comment on lines 41 to 44
def shutdown!
@worker&.flush
@worker&.stop(true)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that calling flush on shutdown will hold off stopping the customer's application. So... we should probably have a good timeout here (it's not clear to me that we do?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Consider maybe moving this one folder up? At least to me it seems a lot of boilerplate to have an empty configuration.rb + a whole extra folder just to support "enabled".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will reduce the unnecessary files 👍🏼

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is actually not needed? Consider moving the require slightly elsewhere and removing this; we already have so much complexity, I think if we avoid a few dummy files, that's a win! (See: This PR having 72 files!)

Same applies to other similar files -- I think we should avoid them as much as possible.

Comment on lines +21 to +53
def build_context(settings)
env = extract_env(settings)
service = extract_service(settings)
version = extract_version(settings)

context = {}
context[:env] = env if env
context[:service] = service if service
context[:version] = version if version

context
end

def extract_env(settings)
return settings.env if settings.respond_to?(:env)
return settings.tags['env'] if settings.respond_to?(:tags)

nil
end

def extract_service(settings)
return settings.service if settings.respond_to?(:service)
return settings.tags['service'] if settings.respond_to?(:tags)

nil
end

def extract_version(settings)
return settings.version if settings.respond_to?(:version)
return settings.tags['version'] if settings.respond_to?(:tags)

nil
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in core/settings.rb we have a bunch of code to match the service/env/version with the tags. This seems to somewhat repeat that logic... I'm curious why it's needed?

Comment on lines 6 to 12
module Datadog
module OpenFeature
module Exposures
BufferBaseClass =
(Core::Environment::Ext::RUBY_ENGINE == 'ruby') ? Core::Buffer::CRuby : Core::Buffer::ThreadSafe

class Buffer < BufferBaseClass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: From what I understood, this feature will require libdatadog and thus not work on JRuby. So... should we raise if we're not on CRuby instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that libdatadog is not working on JRuby, noted 👍🏼

Copy link
Member Author

@Strech Strech Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivoanjo Do you think I just can use CRuby as we will not announce the JRuby support on that feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Maybe keep the branch and set it to nil on JRuby? Or :unsupported? Just so that if we ever want to support JRuby, the code "reminds" us to update itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also on libdatadog + JRuby it indeed doesn't work at the moment and I have not seen any plans to change that)

Comment on lines 54 to 57
def enqueue(event)
buffer.push(event)

flush if buffer.length >= @buffer_limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flush is slightly... suspicious to me. Won't it run on the same thread as the caller, not on the background worker? Is that the intended semantics here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to double check it, good point 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries integrations Involves tracing integrations openfeature A new component that provider an ability to configure feature flags tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants