Add task_latency_processing_no_persistence metric and context counter#9367
Add task_latency_processing_no_persistence metric and context counter#9367awln-temporal merged 4 commits intomainfrom
Conversation
0c551c8 to
b4999cb
Compare
| // TaskPersistenceLatency is used only as a context key for accumulating persistence duration (ContextCounterAdd/Get); not emitted as a metric. | ||
| TaskPersistenceLatency = NewTimerDef( | ||
| "task_persistence_latency", | ||
| WithDescription("Context key for persistence duration; not emitted."), | ||
| ) | ||
| TaskProcessingLatencyNoPersistence = NewTimerDef( | ||
| "task_latency_processing_no_persistence", | ||
| WithDescription("Latency for processing a history task one time excluding persistence."), | ||
| ) |
There was a problem hiding this comment.
why do we need both? typically no_xxx_latency is good enough for monitoring and alerting purpose.
and xxx_latency can be estimated from latency - no_xxx_latency.
There was a problem hiding this comment.
the first is not an actual metric, just a context key to track the persistence latency timer. In the actual metric calculation, we only publish TaskProcessingLatencyNoPersistence. Similar to how we calculate Task processing without UserAttemptLatency, we just track a timer specific for UserAttemptLatency in order to calculate task_processing_no_user_latency etc.
There was a problem hiding this comment.
I can rename the metric if we would like to no_persistency_latency as well
There was a problem hiding this comment.
ah I didn't realize we are already using this approach in other places. Thanks for the context!
I don't really have an opinion on the naming, either no_xxx_latency, or latency_no_xxx works for me. Maybe follow what we have for other metrics.
| // emit total attempt latency so that we know how much time a task will occpy a worker goroutine | ||
| metrics.TaskProcessingLatency.With(e.metricsHandler).Record(attemptLatency) | ||
|
|
||
| if persistenceDuration, ok := metrics.ContextCounterGet(ctx, metrics.TaskPersistenceLatency.Name()); ok { |
There was a problem hiding this comment.
ContextCounterGet will return 0,true even if the key is not present:
temporal/common/metrics/grpc.go
Line 178 in 6875191
I think this is an issue in ContextCounterGet(). We should change that to return true only if that key is present.
There was a problem hiding this comment.
Good catch! yeah with the issue we will have this metric for all other task types as well.
There was a problem hiding this comment.
fixed, thanks for checking this
…#9367) ## What changed? Add `task_processing_no_persistence_latency` metric and context counter. Allows any task to emit this metric as long as TaskPersistenceLatency Counter is incremented. This metric is emitted by Executable MetricsHandler, which adds `namespace`, `operation`, `taskType`, `serviceName` as Metric Tags. In this change, only Visibility tasks publish `task_processing_no_persistence_latency`. ## Why? Separate Visibility task latency from persistence, and allow any other task to emit this metric in the future. ## How did you test it? - [X] built - [X] run locally and tested manually - [X] covered by existing tests - [X] added new unit test(s) - [ ] added new functional test(s)
…temporalio#9367) ## What changed? Add `task_processing_no_persistence_latency` metric and context counter. Allows any task to emit this metric as long as TaskPersistenceLatency Counter is incremented. This metric is emitted by Executable MetricsHandler, which adds `namespace`, `operation`, `taskType`, `serviceName` as Metric Tags. In this change, only Visibility tasks publish `task_processing_no_persistence_latency`. ## Why? Separate Visibility task latency from persistence, and allow any other task to emit this metric in the future. ## How did you test it? - [X] built - [X] run locally and tested manually - [X] covered by existing tests - [X] added new unit test(s) - [ ] added new functional test(s)
What changed?
Add
task_processing_no_persistence_latencymetric and context counter. Allows any task to emit this metric as long as TaskPersistenceLatency Counter is incremented.This metric is emitted by Executable MetricsHandler, which adds
namespace,operation,taskType,serviceNameas Metric Tags.In this change, only Visibility tasks publish
task_processing_no_persistence_latency.Why?
Separate Visibility task latency from persistence, and allow any other task to emit this metric in the future.
How did you test it?