-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats/opentelemetry: Introduce Tracing API #7852
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7852 +/- ##
==========================================
- Coverage 82.09% 82.09% -0.01%
==========================================
Files 379 380 +1
Lines 38261 38405 +144
==========================================
+ Hits 31409 31527 +118
- Misses 5549 5568 +19
- Partials 1303 1310 +7
|
@@ -119,22 +134,46 @@ func (h *clientStatsHandler) streamInterceptor(ctx context.Context, desc *grpc.S | |||
} | |||
|
|||
startTime := time.Now() | |||
ctx, span := h.createCallTraceSpan(ctx, method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should just not call createCallTraceSpan if tracing is disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@aranjans please update this with latest main branch to get rid of all go.mod and go.sum changes |
378ddd3
to
8cb8222
Compare
@purnesh42H Thanks for your review, I have addressed all your comments and this PR is ready for another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more non-test comments. Will review tests in next pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more non-test comments. Will review tests in next pass
@aranjans you should link the gRFC and concise your description |
@purnesh42H I have addressed all the comments, and updated the description to link the grfc proposal. |
@purnesh42H I have addressed all your comments, and updated the PR. Kindly review the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. 3 more documentation comments
stats/opentelemetry/e2e_test.go
Outdated
// TestMetricsAndTracesOptionEnabled verifies the integration of metrics and traces | ||
// emitted by the OpenTelemetry instrumentation in a gRPC environment. It sets up a | ||
// stub server with both metrics and traces enabled, and tests the correct emission | ||
// of metrics during a Unary RPC and a Streaming RPC. The test ensures that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: correct emission of metrics and traces
stats/opentelemetry/e2e_test.go
Outdated
defer cancel() | ||
|
||
// Make two RPC's, a unary RPC and a streaming RPC. These should cause | ||
// certain metrics to be emitted, which should be observed through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is verifying traces. Please mention traces everywhere in tests which are testing traces and they are observed through span exporter
stats/opentelemetry/e2e_test.go
Outdated
defer cancel() | ||
|
||
// Make two RPC's, a unary RPC and a streaming RPC. These should cause | ||
// certain metrics to be emitted, which should be observed through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. We are verifying the traces and verifying using span exporter
// of metrics during a Unary RPC and a Streaming RPC. The test ensures that the | ||
// emitted metrics reflect the operations performed, including the size of the | ||
// compressed message, and verifies that tracing information is correctly recorded. | ||
func (s) TestMetricsAndTracesOptionEnabled(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aranjans may be one more tests to verify metrics and traces are not emitted if they are not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@purnesh42H I have added the test. Please note: I have added the test which verifies the current setup works as expected when traces and metrics are disabled.
@aranjans looks like there is some segmentation fault in stats package https://github.com/grpc/grpc-go/actions/runs/12372209143/job/34529917692?pr=7852. It looks like some test is written wrong |
…y_test to opentelemetry
stream.go
Outdated
@@ -213,13 +213,13 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
// Provide an opportunity for the first RPC to see the first service config | |||
// provided by the resolver. | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
cc.nameResolutionDelayed = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this trying to capture? I don't think this is what you want.
For example, if name resolution takes 30 seconds to occur, and there is no deadline on the RPC, then this will not ever be set.
Also, making this a boolean field on the ClientConn
can't be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its for capturing the delay in name resolution https://github.com/grpc/proposal/blob/master/A72-open-telemetry-tracing.md#tracing-information
I think we need to specifically look for context timeout error to set this? waitForResolvedAddrs doesn't directly calculate delay; it waits for resolved addresses, and if resolution takes too long, it will return a timeout error?
What is the expected behavior when there is no deadline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the feature is to determine, if a name resolution delay occurred during the RPC's lifespan, how long that took.
So basically you want to add events to the lifecycle of the RPC when these things occur:
Lines 682 to 684 in e957825
select { | |
case <-cc.firstResolveEvent.Done(): | |
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dfawley and @purnesh42H for your thoughts, I will create a separate PR for this as a bonus feature.
@dfawley As per our offline discussion to exclude changes of adding event for name resolution delay, and create a separate PR. I have updated the PR. |
@@ -215,7 +215,6 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
if err := cc.waitForResolvedAddrs(ctx); err != nil { | |||
return nil, err | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just revert this file.
// TextMapPropagator propagates span context through text map carrier. | ||
TextMapPropagator propagation.TextMapPropagator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the user supposed to be able to override this?
We have a default for them, though, right? That should get documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go into experimental/opentelemetry
, or even experimental/stats/opentelemetry
? Either way there should be comments indicating that it will be moved (and where it will be moved to, and when).
func init() { | ||
otelinternal.SetPluginOption = func(o *Options, po otelinternal.PluginOption) { | ||
o.MetricsOptions.pluginOption = po | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this back to where it was to avoid the code churn please?
@@ -66,6 +68,15 @@ func (h *serverStatsHandler) initializeMetrics() { | |||
rm.registerMetrics(metrics, meter) | |||
} | |||
|
|||
func (h *serverStatsHandler) initializeTracing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go into a new file? server_tracing.go
?
@@ -171,6 +176,14 @@ func removeLeadingSlash(mn string) string { | |||
return strings.TrimLeft(mn, "/") | |||
} | |||
|
|||
func isMetricsDisabled(mo MetricsOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now looks like:
if !isMetricsDisabled() {
}
It should instead be:
if isMetricsEnabled() {
}
to avoid a double negative.
otel.SetTextMapPropagator(h.options.TraceOptions.TextMapPropagator) | ||
otel.SetTracerProvider(h.options.TraceOptions.TracerProvider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, these are global settings. I don't think that's what we want... Only the user should ever be calling these kinds of function, not our library.
return mo.MeterProvider == nil | ||
} | ||
|
||
func isTracingDisabled(to experimental.TraceOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss options, but requiring these to be called multiple times from the stats handler feels problematic. Maybe there's a tracing stats handler and a metrics stats handler and we combine them based on which ones are enabled?
At the very least, these should become methods so we don't have to keep reaching into our data structures when calling them to find the options fields.
} | ||
ri := &rpcInfo{ | ||
ai.startTime = startTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change how this is set?
Why are you setting a local 3 lines above to time.Now() and then assigning this field to that?
ai := &attemptInfo{} | ||
startTime := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as above.
This pull request adds the OpenTelemetry tracing API to the grpc-go opentelemetry plugin as outlined in proposal A72.
RELEASE NOTES:
stats/opentelemetry/experimental
. This includes the addition ofTraceOptions
in theOptions
struct to allow users to specify theTraceProvider
andTextMapPropagator
.