-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(otelgin): add support for recording panics #5090
base: main
Are you sure you want to change the base?
feat(otelgin): add support for recording panics #5090
Conversation
@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc { | |||
opts = append(opts, oteltrace.WithAttributes(rAttr)) | |||
} | |||
ctx, span := tracer.Start(ctx, spanName, opts...) | |||
defer span.End() | |||
defer func() { | |||
if cfg.RecordPanicConfig.enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: added this as an opt-in behavior to preserve existing behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this option. This is new behavior, not a breaking change.
CHANGELOG.md
Outdated
@@ -15,6 +15,7 @@ The next release will require at least [Go 1.21]. | |||
|
|||
- Add the new `go.opentelemetry.io/contrib/instrgen` package to provide auto-generated source code instrumentation. (#3068, #3108) | |||
- Support [Go 1.22]. (#5082) | |||
- Add support to record panics in `go.opentelemetry.io.contrib/instrumentation/github.com/gin-gonic/gin/otelgin` (#5090) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add support to record panics in `go.opentelemetry.io.contrib/instrumentation/github.com/gin-gonic/gin/otelgin` (#5090) | |
- Add support to record panics in `go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin`. (#5090) |
@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc { | |||
opts = append(opts, oteltrace.WithAttributes(rAttr)) | |||
} | |||
ctx, span := tracer.Start(ctx, spanName, opts...) | |||
defer span.End() | |||
defer func() { | |||
if cfg.RecordPanicConfig.enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this option. This is new behavior, not a breaking change.
if cfg.RecordPanicConfig.enabled { | ||
if r := recover(); r != nil { | ||
err := fmt.Errorf("error handling request: %s", r) | ||
span.RecordError(err, oteltrace.WithStackTrace(cfg.RecordPanicConfig.stackTrace)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, the config is probably superfluous.
defer func() { | ||
if cfg.RecordPanicConfig.enabled { | ||
if r := recover(); r != nil { | ||
err := fmt.Errorf("error handling request: %s", r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the added value of creating a new error here over recording the original one?
If we do want to create a new error, we should wrap the original one:
err := fmt.Errorf("error handling request: %s", r) | |
err := fmt.Errorf("error handling request: %w", r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll update to record the original error
if r := recover(); r != nil { | ||
err := fmt.Errorf("error handling request: %s", r) | ||
span.RecordError(err, oteltrace.WithStackTrace(cfg.RecordPanicConfig.stackTrace)) | ||
span.SetStatus(codes.Error, "panic recovered") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span.SetStatus(codes.Error, "panic recovered") | |
span.SetStatus(codes.Error, err.Error()) |
@@ -88,3 +129,10 @@ func WithSpanNameFormatter(f func(r *http.Request) string) Option { | |||
c.SpanNameFormatter = f | |||
}) | |||
} | |||
|
|||
// WithRecordPanicConfig specifies a configuration for panic recording. | |||
func WithRecordPanicConfig(cfg RecordPanicConfig) Option { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do want to keep this config, how about:
func WithRecordPanicConfig(cfg RecordPanicConfig) Option { | |
func WithRecordPanicConfig(enabled, stacktrace bool) Option { |
Then, there's no need to have a config within a config.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5090 +/- ##
=======================================
- Coverage 61.2% 61.1% -0.2%
=======================================
Files 185 185
Lines 11207 13786 +2579
=======================================
+ Hits 6865 8428 +1563
- Misses 4142 5162 +1020
+ Partials 200 196 -4
|
@@ -86,7 +86,18 @@ func Middleware(service string, opts ...Option) gin.HandlerFunc { | |||
opts = append(opts, oteltrace.WithAttributes(rAttr)) | |||
} | |||
ctx, span := tracer.Start(ctx, spanName, opts...) | |||
defer span.End() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is already in the default SDK's End
method:
Why is this being added here? It seems like an SDK concern not an instrumentation one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a really good question. the general pattern/setup i typically deal with have a middleware pattern that look something like this:
- panic recovery -pretty simple and attempts to recover any panics, log them, and sets the response code to 500. this is always at the top to catch any and everything that it can.
- otelgin (this middleware)
- auth and other middlewares
panics show up in logs via the panic recovery middleware described above and also show up in the spans via the mechanism you linked. thinking about this more, the gap i currently have is less that panics get added to this specific span somehow, and more that the spans generated by the otelgin middleware don't have a span status as error and don't set the http.status_code
to anything. when an alert fires for an increase in 5XX error codes, i usually first go to Jaeger and type something like http.status_code=500
or error=true
. in the cases of panics, those types of queries don't work.
given that, i think this would leave a few options:
- don't update this middleware. update the middleware ordering i currently use to be something like panic recover -> otelgin -> panic recover. this would result in the net result i want, but using the same middleware twice doesn't feel great.
- update this middleware roughly as it's currently written (still need to apply feedback with not having configs within configs, etc). this would result in the
http.status_code
field not being guaranteed, but that's not necessarily the responsibility of this middleware to set that. the span status will still be set to error, which is a strong signal. - update this middleware roughly as written, but also give the user the ability to set a status code if a panic is caught. this feels like adding too much responsibility to this middleware.
i'm leaning towards 2, but i'd love to get other opinions as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the latest semantic conventions, the error.type
attribute should be used here to capture the panic information.
It is a known thing that our HTTP instrumentation is quite behind on semantic conventions, but I think this should be our target. Instead of having configuration to enable this, we should add the error.type
attribute if a panic occurred always.
This will require this instrumentation to go through the process of upgrading semconv which will require a migration path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late reply - getting back to this now. i think that makes sense. i'm not familiar with these migrations, but i'll happily give it a shot. @MrAlias i may poke you for help with the migrations if i run into issues or have questions. i'll apply these changes, as well as the suggestions from @dmathieu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it looks like #5092 will take care of the migration plan. i might wait until that PR is merged and then make the changes to this package on top of that. how does that sound?
1213805
to
0b7fca7
Compare
Hello, |
hey @mxpetit - from the discussion in this thread, my current plan is to wait for #5092 to complete, likely make a few changes on top of that, then look to get this merged. |
I think the handling of this PR should refer to the method of this PR: #6366 |
I added a comment here: #5088 (comment) @asecondo, I am changing this PR to draft as it is not in a mergeable state there are no approvals. |
Implements as described in #5088