-
Notifications
You must be signed in to change notification settings - Fork 397
[FFL-1319] Add feature flags events exposure #5024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for updating Change log entry section 👏 Visited at: 2025-11-11 09:56:41 UTC |
BenchmarksBenchmark execution time: 2025-11-11 15:59:53 Comparing candidate commit 760f2c5 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics. scenario:tracing - Tracing.log_correlation
|
dbc899c to
0a25c37
Compare
b1a978f to
9e7efb0
Compare
Typing analysisNote: Ignored files are excluded from the next sections. Untyped methodsThis PR introduces 2 untyped methods and 10 partially typed methods. It increases the percentage of typed methods from 53.12% to 54.68% (+1.56%). Untyped methods (+2-0)❌ Introduced:Partially typed methods (+10-0)❌ Introduced:Untyped other declarationsThis PR introduces 1 untyped other declaration and 4 partially typed other declarations. It increases the percentage of typed other declarations from 66.92% to 67.95% (+1.03%). Untyped other declarations (+1-0)❌ Introduced:Partially typed other declarations (+4-0)❌ Introduced:If you believe a method or an attribute is rightfully untyped or partially typed, you can add |
This comment has been minimized.
This comment has been minimized.
d7424f9 to
2221c34
Compare
| evaluation_context: evaluation_context | ||
| ) | ||
|
|
||
| if result.error_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't have a quick answer for this earlier if all results with an error code should return the default_value. I'm still not sure if this works with libdatadog but I wrote the code in #5022 so that it works with the Ruby evaluator by setting error_code: nil when there is a successful match in the UFC.
dd-trace-rb/lib/datadog/open_feature/binding/internal_evaluator.rb
Lines 147 to 163 in 8352517
| def create_success_result(value, variant, allocation_key, variation_type, do_log, reason) | |
| ResolutionDetails.new( | |
| value: value, | |
| variant: variant, | |
| error_code: nil, # nil for successful cases (so "if result.error_code" works correctly) | |
| error_message: '', # Empty string for Ok cases (matches libdatadog FFI) | |
| reason: reason, | |
| allocation_key: allocation_key, | |
| do_log: do_log, | |
| flag_metadata: { | |
| "allocation_key" => allocation_key, | |
| "variation_type" => variation_type, | |
| "do_log" => do_log | |
| }, | |
| extra_logging: {} | |
| ) | |
| end |
I don't think this is blocking our ability to merge #5022 though because this interface is internal and we can adjust it as needed with no external impact.
I see two cases in libdatadog where there would be an error code Ok. For these two cases, this check works as intended. Since the error code is not null, we will return default_value.
EvaluationError::FlagDisabled => ErrorCode::Ok,
EvaluationError::DefaultAllocationNull => ErrorCode::Ok,
The part where I'm confused --> is the error code also "Ok" if the evaluation is successful? Specifically, I'm looking at
Ok(_) => ErrorCode::Ok, // Always returns ErrorCode::Ok for success?
Err(err) => ErrorCode::from(err),
If the error code is never nil, then this if result.error_code check won't work with libdatadog. I'd imagine we need to check if result.value is nil or if result.flag_metadata is an empty hash or check some other attribute to differentiate a disabled flag from a successful match
Maybe this is a question for @dd-oleksii: how do we differentiate a successful evaluation (with an allocation match) from the FlagDisabled/DefaultAllocationNull cases if both have a "Ok" error code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best option is to create interface we want, the best option is not to guess on my side and have it done via
if result.error?
...Hence the simple error? method would solve the issue and could be replicated later with C-extension
Based on the Rust code I've looked at, yes, it returns in few places Ok, when it actually must return Err. Not sure what was the intent, but it should not shallow errors and return things as it and it's our job to represent it to the customer properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a method or an attribute that gives a simple answer to that question could be part of the binding code eventually
| :error_message, | ||
| :flag_metadata, | ||
| :allocation_key, | ||
| :do_log, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a method named log? would be more usual for Ruby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that's an interface I would get from the binding 🤷🏼
To be honest I'm a bit desperate with the binding.
But I think that makes sense to do @sameerank
|
|
||
| @worker.enqueue(event) | ||
| rescue => e | ||
| @logger.debug { "OpenFeature: Reporter failed to enqueue exposure: #{e.class}: #{e.message}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this log message could be a bit more user-friendly. How about: "OpenFeature: Failed to report evaluation result: ..."?
| end | ||
|
|
||
| def enqueue(event) | ||
| return false if forked? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? It would prevent event reporting for all forks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one - not sure, I think I will allow it in forks and then see if we need to restrict it. So far I did it to limit things I have to care.
|
|
||
| def flush | ||
| events, dropped = dequeue | ||
| send_events(events || [], dropped.to_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use Array(events) instead of events || []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and could dropped be anything other than an integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both could come as nil because of the call in start. I would wrap events with array 👍🏼
| def send_payload(payload) | ||
| @flush_mutex.synchronize do | ||
| response = @transport.send_exposures(payload) | ||
| logger.debug { "OpenFeature: Send exposures response was not OK: #{response.inspect}" } unless response&.ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is visible for our customers - is "exposures" a term that we are using a lot in our public documentation and UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, that's an internal terminology (not publicly mentioned, so no alternative). We probably could rephrase it as "unable to send evaluation details metric ..." (that's what we use in the interface)
|
|
||
| yield(api, env) | ||
| rescue => e | ||
| message = "Internal error during #{self.class.name} request. Cause: #{e.class.name} #{e.message} " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include self.class.name here in the message, as we are already reporting the partial backtrace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a copy-paste, I will try to rework the message 👍🏼
lib/datadog/open_feature/binding.rb
Outdated
| end | ||
| end | ||
|
|
||
| require_relative 'binding/evaluator' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we move requires to the top of the file please?
| return unless settings.open_feature.enabled | ||
|
|
||
| unless settings.remote.enabled | ||
| logger.warn('OpenFeature: Could not be enabled without Remote Configuration Management available. To enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
"OpenFeature could not be enabled without Remote Configuration. The enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config"
| # In the example from the OpenFeature there is zero trust to the result of the evaluation | ||
| # do we want to go that way? | ||
|
|
||
| result = @evaluator.get_assignment(flag_key, evaluation_context, expected_type, Time.now.utc.to_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to call ufc on Time.now, considering that we are converting it to int anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mock call and it will change in the other PR, so, we don't need it.
|
|
||
| result | ||
| rescue => e | ||
| @telemetry.report(e, description: 'OpenFeature: Failed to fetch value for flag') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the flag key to this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, because it's private and we are not allowed to report it
P.S Flag name might uncover certain business-value and hence considered private
| require_relative 'exposures/buffer' | ||
| require_relative 'exposures/worker' | ||
| require_relative 'exposures/deduplicator' | ||
| require_relative 'exposures/reporter' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we place those at the beginning of the file for consistency with our code base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's different in our codebase, but will do. What I don't like - logically they will define module, when the definition is actually here. But don't mind to change
| data = content.data.read | ||
| content.data.rewind | ||
|
|
||
| raise ReadError, 'EOF reached' if data.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to raise here, or log a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a part that I have no answer, but that's what all the components have in their remote 🤷🏼
| register_receivers(Datadog::DI::Remote.receivers(@telemetry)) | ||
| end | ||
|
|
||
| if settings.respond_to?(:open_feature) && settings.open_feature.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I have to keep this check because of some tests are impossible to rewrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they mocking the API? :( Thanks for trying anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove one check tho, so 50% success rate. The test is called loading_spec.rb and it is checking the base minimum configuration without all components loaded. And then load them 1-by-1
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few notes!
| register_receivers(Datadog::DI::Remote.receivers(@telemetry)) | ||
| end | ||
|
|
||
| if settings.respond_to?(:open_feature) && settings.open_feature.enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they mocking the API? :( Thanks for trying anyway.
| def shutdown! | ||
| @worker&.flush | ||
| @worker&.stop(true) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that calling flush on shutdown will hold off stopping the customer's application. So... we should probably have a good timeout here (it's not clear to me that we do?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Consider maybe moving this one folder up? At least to me it seems a lot of boilerplate to have an empty configuration.rb + a whole extra folder just to support "enabled".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I will reduce the unnecessary files 👍🏼
lib/datadog/open_feature/binding.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This is actually not needed? Consider moving the require slightly elsewhere and removing this; we already have so much complexity, I think if we avoid a few dummy files, that's a win! (See: This PR having 72 files!)
Same applies to other similar files -- I think we should avoid them as much as possible.
| def build_context(settings) | ||
| env = extract_env(settings) | ||
| service = extract_service(settings) | ||
| version = extract_version(settings) | ||
|
|
||
| context = {} | ||
| context[:env] = env if env | ||
| context[:service] = service if service | ||
| context[:version] = version if version | ||
|
|
||
| context | ||
| end | ||
|
|
||
| def extract_env(settings) | ||
| return settings.env if settings.respond_to?(:env) | ||
| return settings.tags['env'] if settings.respond_to?(:tags) | ||
|
|
||
| nil | ||
| end | ||
|
|
||
| def extract_service(settings) | ||
| return settings.service if settings.respond_to?(:service) | ||
| return settings.tags['service'] if settings.respond_to?(:tags) | ||
|
|
||
| nil | ||
| end | ||
|
|
||
| def extract_version(settings) | ||
| return settings.version if settings.respond_to?(:version) | ||
| return settings.tags['version'] if settings.respond_to?(:tags) | ||
|
|
||
| nil | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in core/settings.rb we have a bunch of code to match the service/env/version with the tags. This seems to somewhat repeat that logic... I'm curious why it's needed?
| module Datadog | ||
| module OpenFeature | ||
| module Exposures | ||
| BufferBaseClass = | ||
| (Core::Environment::Ext::RUBY_ENGINE == 'ruby') ? Core::Buffer::CRuby : Core::Buffer::ThreadSafe | ||
|
|
||
| class Buffer < BufferBaseClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: From what I understood, this feature will require libdatadog and thus not work on JRuby. So... should we raise if we're not on CRuby instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that libdatadog is not working on JRuby, noted 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivoanjo Do you think I just can use CRuby as we will not announce the JRuby support on that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Maybe keep the branch and set it to nil on JRuby? Or :unsupported? Just so that if we ever want to support JRuby, the code "reminds" us to update itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also on libdatadog + JRuby it indeed doesn't work at the moment and I have not seen any plans to change that)
| def enqueue(event) | ||
| buffer.push(event) | ||
|
|
||
| flush if buffer.length >= @buffer_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flush is slightly... suspicious to me. Won't it run on the same thread as the caller, not on the background worker? Is that the intended semantics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to double check it, good point 👍🏼
What does this PR do?
Adds new component under
lib/datadog/open_feature/folder that is providing customer an ability to configure Datadog feature flags provider. This provider is going to relay on Remote Configuration to deliver feature flags configurations (aka UFC Universal Flag Configuration).Said provider is going into customer code as a part of configuration for OpenFeature
Motivation:
This is a part of upcoming work across all libs.
Important
This code doesn't contain actual evaluation logic, but rather establish everything for it to be placed in the next PR.
Change log entry
Yes. OpenFeature: Add experimental OpenFeature component.
Additional Notes:
This PR was already reviewed as a part of #5022, but accidentally was marked as ready for review. This code is fresh and already applied some suggestions from the previous review, but not everything is possible due to the existing established way or the time boundaries.
Structure:
I would like to ask some pair of eyes on thread-safety and potential issues of new component. I've run the code over certain checks already, but never enough.
How to test the change?
CI and we will have new set of ST in #5022 enabled to prove that everything works.