fix: remove redundant MetricsCapture from trace_call#1522
fix: remove redundant MetricsCapture from trace_call#1522waiho-gumloop wants to merge 1 commit intogoogleapis:mainfrom
Conversation
trace_call() wraps every Spanner operation with a bare MetricsCapture() that creates a MetricsTracer without project_id or instance_id. Since every caller of trace_call already provides its own MetricsCapture with resource_info, this inner one is redundant. The redundant tracer records operation metrics with incomplete resource labels on every operation. Because OpenTelemetry uses cumulative aggregation, these orphan data points persist for the process lifetime and get re-exported every 60 seconds. Cloud Monitoring rejects them with INVALID_ARGUMENT (missing instance_id), producing repeated error logs. Removing the bare MetricsCapture from trace_call eliminates the orphan metric data points entirely. Callers continue to provide their own MetricsCapture(resource_info) with correct labels. Fixes: googleapis#1319
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where a redundant OpenTelemetry Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request removes the MetricsCapture context manager from the trace_call function in _opentelemetry_tracing.py. This change discontinues the use of MetricsCapture within the OpenTelemetry tracing context for Spanner client calls. There is no feedback to provide on the changes.
Summary
trace_call()wraps every Spanner operation with a bareMetricsCapture()that creates aMetricsTracerwithoutproject_idorinstance_id. Since every caller oftrace_callalready provides its ownMetricsCapture(resource_info)with correct labels, the one insidetrace_callis redundant and harmful.The redundant tracer records operation metrics (via
record_operation_completion()) with incomplete resource labels on every operation. Because OpenTelemetry uses cumulative aggregation, these orphan data points persist for the process lifetime and get re-exported every 60 seconds by thePeriodicExportingMetricReader. Cloud Monitoring rejects them with:Root cause
When Python evaluates
with trace_call(...) as span, MetricsCapture(resource_info):, the execution order is:trace_call.__enter__()→ creates internal bareMetricsCapture()→ tracer_A (noproject_id/instance_id)MetricsCapture(resource_info).__enter__()→ tracer_B (has correct labels, overwrites tracer_A in context var)MetricsInterceptorMetricsCapture.__exit__()→ records correct metrics on tracer_B, resets context to tracer_Atrace_call.__exit__()→ records metrics on tracer_A with incomplete labelsThis creates persistent orphan aggregation buckets in the OpenTelemetry SDK that are re-exported every 60s.
History
MetricsCapture()instances were bare — both the one insidetrace_calland the ones at caller sites. The design relied onMetricsInterceptorto populate labels during gRPC calls. At this point, thetrace_callMetricsCapture was not redundant._resource_infoproperty and changed all caller sites fromMetricsCapture()toMetricsCapture(self._resource_info)for eager label propagation. However, the bareMetricsCapture()insidetrace_callwas not removed during this large refactor (225 files changed), making it redundant and harmful.Fix
Remove the bare
MetricsCapture()fromtrace_call. All ~27 call sites already provide their ownMetricsCapturewith correct resource labels.Testing
All existing unit tests pass (46/46). The change only removes the redundant context manager; span/trace behavior is unchanged.
Fixes #1523