-
Notifications
You must be signed in to change notification settings - Fork 862
feat(histograms): static config for controlling timer/histogram emission #7298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Steven L <[email protected]>
bd3307d
to
273f53c
Compare
…nstances. Signed-off-by: Steven L <[email protected]>
Signed-off-by: Steven L <[email protected]>
Signed-off-by: Steven L <[email protected]>
|
||
// Module provides metrics client for fx application. | ||
var Module = fx.Module("metricsfx", | ||
fx.Provide(func(c config.Config) metrics.HistogramMigration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.Config
is provided by config.New
during service startup, already exists
return fmt.Errorf("cannot read histogram migration mode as a string: %w", err) | ||
} | ||
switch value { | ||
case "timer", "histogram", "both": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surprisingly, as tests show, empty values don't call this unmarshal func at all. so they don't "need" to be handled.
that seems rather Bad™, but I don't think it'll change while we're on yaml/v2. and tests should catch if/when it changes in later major versions / different libraries.
return NewStopwatch(timer) | ||
} | ||
} | ||
return Stopwatch{} // noop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an explicitly-empty value seems a bit clearer about intent than NewTestStopwatch()
or NewStopwatch()
, though they're essentially identical (just with time.Now()
instead of a zero-valued Time). happy to change if anyone disagrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind updating the diff description to reference something about the overall intent of the project? I added https://github.com/orgs/cadence-workflow/projects/1?pane=issue&itemId=132424271 as a draft, but maybe ther's something better already?
Signed-off-by: Steven L <[email protected]>
Signed-off-by: Steven L <[email protected]>
Signed-off-by: Steven L <[email protected]>
Signed-off-by: Steven L <[email protected]>
…names Signed-off-by: Steven L <[email protected]>
import ( | ||
"testing" | ||
|
||
"golang.org/x/exp/maps" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is out of x
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it's in 1.23. I thought it was 1.24 for some reason (and we're still in a bit of a struggle to upgrade to 1.24)
I may make a minor followup to replace these, in that case.
…7310) Followup on #7298 to be a bit easier to set up. This is pretty minor, but helps us internally (and possibly others) by not requiring the "main" config object (which we do not use internally, at least in our main service binary). Anyone using the standard fx startup should be unaffected, as it now also provides the finer-grained histogram config. Otherwise, if you get an error about `metrics.HistogramMigration` not being provided: just `fx.Provide(func(...) metrics.HistogramMigration { ... })` it however you like. And, last but not least: apparently automerge used an old commit message for #7298. Apologies about that, check the PR for an updated and much more descriptive description. Signed-off-by: Steven L <[email protected]>
Now that the basics are working: time to allow turning off timers.
Expected uses are:
Wait for us to finish, then get everything all at once by default
The default config is currently set to "timers", which will not emit the new histograms.
This is primarily to:
When we're done creating all the new histograms and are happy with them, we'll decide on a rollout strategy and will change the default behavior to match that.
(this may just be "dual-emit everything all at once", or something more gradual)
Gradually migrate once things seem ready
Some time after we introduce a new histogram version of a timer metric, you can opt into it with config:
Then you can change your alerts/dashboards/whatever to use the new data, and optionally disable the timer to stop emitting it:
This may also be useful for handling the final "we're changing the default!" migration more gradually, as you can change the default and opt in piece by piece:
Join us on the bumpy road
Internally, to migrate quicker and to get a feel for if our changes are correct (and change things if it seems necessary), we're going to be dual-emitting ASAP and removing ASAP:
We will probably also change the sample development yamls to use this, to serve as a sample and an early warning for people who use them.
If you have custom timers to migrate in wrapping code
No problem! Just add to the static list of metrics being migrated, and you can use this config too:
We will eventually remove timer APIs from our
metrics.Scope
though, so you should definitely tackle this migration "soon" after we switch the default mode (or earlier).I'd originally considered doing this in dynamic config, but I doubt we'll need to make quick changes, and static config is significantly less of a pain, both internally for Reasons™ and in code due to the key-ID verbosity.
Unrelatedly: this PR is a near-ideal example of why dependency injection frameworks are nice 😢