Skip to content

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Oct 3, 2025

Now that the basics are working: time to allow turning off timers.

Expected uses are:

Wait for us to finish, then get everything all at once by default

The default config is currently set to "timers", which will not emit the new histograms.

This is primarily to:

  • prevent you from being surprised by a large jump in your metrics cardinality
  • prevent us from breaking your already-migrated alerts/dashboards/whatever due to us changing our mind on how to emit that data
    • for the most part though we only expect this for very-new histograms

When we're done creating all the new histograms and are happy with them, we'll decide on a rollout strategy and will change the default behavior to match that.
(this may just be "dual-emit everything all at once", or something more gradual)

Gradually migrate once things seem ready

Some time after we introduce a new histogram version of a timer metric, you can opt into it with config:

histograms:
  names:
    the_new_histogram_ns: true
    oops_wrong_name: true  # this will error, so you can't accidentally mis-spell things

Then you can change your alerts/dashboards/whatever to use the new data, and optionally disable the timer to stop emitting it:

histograms:
  names:
    the_old_timer: false
    the_new_histogram_ns: true

This may also be useful for handling the final "we're changing the default!" migration more gradually, as you can change the default and opt in piece by piece:

histograms:
  default: timers
  names:
    the_old_timer: false        # stop emitting just this timer
    the_new_histogram_ns: true  # opt into just this histogram

Join us on the bumpy road

Internally, to migrate quicker and to get a feel for if our changes are correct (and change things if it seems necessary), we're going to be dual-emitting ASAP and removing ASAP:

histograms:
  default: both
  names:
    the_old_timer: false  # already migrated this one, so turn it off

We will probably also change the sample development yamls to use this, to serve as a sample and an early warning for people who use them.


If you have custom timers to migrate in wrapping code

No problem! Just add to the static list of metrics being migrated, and you can use this config too:

package main

import "github.com/uber/cadence/common/metrics"

func init() {
  metrics.HistogramMigrationMetrics["your_custom_timer"] = struct{}{}
  metrics.HistogramMigrationMetrics["your_custom_histogram"] = struct{}{}
}
func main() {
  // ... service startup stuff
}
histograms:
  names:
    your_custom_timer: false     # this will error unless you
    your_custom_histogram: true  # change the metrics map keys

We will eventually remove timer APIs from our metrics.Scope though, so you should definitely tackle this migration "soon" after we switch the default mode (or earlier).


I'd originally considered doing this in dynamic config, but I doubt we'll need to make quick changes, and static config is significantly less of a pain, both internally for Reasons™ and in code due to the key-ID verbosity.

Unrelatedly: this PR is a near-ideal example of why dependency injection frameworks are nice 😢

@Groxx Groxx force-pushed the histogram-config branch from bd3307d to 273f53c Compare October 3, 2025 00:43

// Module provides metrics client for fx application.
var Module = fx.Module("metricsfx",
fx.Provide(func(c config.Config) metrics.HistogramMigration {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.Config is provided by config.New during service startup, already exists

return fmt.Errorf("cannot read histogram migration mode as a string: %w", err)
}
switch value {
case "timer", "histogram", "both":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly, as tests show, empty values don't call this unmarshal func at all. so they don't "need" to be handled.

that seems rather Bad™, but I don't think it'll change while we're on yaml/v2. and tests should catch if/when it changes in later major versions / different libraries.

return NewStopwatch(timer)
}
}
return Stopwatch{} // noop
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an explicitly-empty value seems a bit clearer about intent than NewTestStopwatch() or NewStopwatch(), though they're essentially identical (just with time.Now() instead of a zero-valued Time). happy to change if anyone disagrees.

Copy link
Member

@davidporter-id-au davidporter-id-au left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind updating the diff description to reference something about the overall intent of the project? I added https://github.com/orgs/cadence-workflow/projects/1?pane=issue&itemId=132424271 as a draft, but maybe ther's something better already?

import (
"testing"

"golang.org/x/exp/maps"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is out of x now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it's in 1.23. I thought it was 1.24 for some reason (and we're still in a bit of a struggle to upgrade to 1.24)

I may make a minor followup to replace these, in that case.

@Groxx Groxx merged commit 1f49d66 into cadence-workflow:master Oct 8, 2025
43 checks passed
Groxx added a commit that referenced this pull request Oct 8, 2025
…7310)

Followup on #7298 to be a bit easier to set up.
This is pretty minor, but helps us internally (and possibly others) by
not requiring the "main" config object (which we do not use internally,
at least in our main service binary).

Anyone using the standard fx startup should be unaffected, as it now
also provides the finer-grained histogram config.
Otherwise, if you get an error about `metrics.HistogramMigration` not
being provided: just `fx.Provide(func(...) metrics.HistogramMigration {
... })` it however you like.

And, last but not least: apparently automerge used an old commit message
for #7298. Apologies about that, check the PR for an updated and much
more descriptive description.

Signed-off-by: Steven L <[email protected]>
@Groxx Groxx deleted the histogram-config branch October 8, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants