-
Notifications
You must be signed in to change notification settings - Fork 121
Add tool error detection to telemetry middleware #2092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add tool error detection to telemetry middleware #2092
Conversation
Closes stacklok#2084 Signed-off-by: Deepam02 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2092 +/- ##
==========================================
+ Coverage 48.31% 48.37% +0.05%
==========================================
Files 238 238
Lines 30034 30067 +33
==========================================
+ Hits 14512 14545 +33
Misses 14423 14423
Partials 1099 1099 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Deepam02 <[email protected]>
@claude please review this |
Claude encountered an error —— View job
I'll analyze this and get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds tool error detection to telemetry middleware to track MCP (Model Context Protocol) tool execution errors through observability signals. When MCP tools fail, the system now captures this information in both traces and metrics.
- Adds
mcp.tool.error
attribute to trace spans when tool execution errors are detected - Introduces
tool_error
status category in metrics for failed tool calls - Implements lightweight error detection by scanning response payloads for
"isError":true
patterns
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
pkg/telemetry/middleware.go | Core implementation of tool error detection logic and telemetry integration |
pkg/telemetry/middleware_test.go | Unit tests for error detection function and response writer behavior |
pkg/telemetry/integration_test.go | Integration test verifying end-to-end tool error detection in telemetry pipeline |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Deepam02 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
// Buffer response data for tool calls to enable proper error detection | ||
if rw.isToolCall && !rw.hasToolError { | ||
rw.responseBuffer = append(rw.responseBuffer, data...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unbounded memory usage: response buffer grows without limits for tool calls. Consider adding a maximum buffer size to prevent potential memory exhaustion on large responses.
Copilot uses AI. Check for mistakes.
if rw.isToolCall && !rw.hasToolError && len(rw.responseBuffer) > 0 { | ||
rw.hasToolError = detectMCPToolError(rw.responseBuffer) | ||
// Clear buffer to free memory | ||
rw.responseBuffer = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient error detection: detectMCPToolError
is called on the complete buffered response, which requires parsing the entire JSON even for large responses. Consider streaming JSON parsing or early detection when isError
field is encountered.
Copilot uses AI. Check for mistakes.
@claude try again and review this |
Claude encountered an error —— View job
I'll analyze this and get back to you. |
it is a simple solution
we added
mcp.tool.error
field to traces andtool_error
status to metrics when MCP tools fail.Closes #2084