Add tool error detection to telemetry middleware #2092

Deepam02 · 2025-10-04T05:00:24Z

it is a simple solution
we added mcp.tool.error field to traces and tool_error status to metrics when MCP tools fail.

Closes #2084

Closes stacklok#2084 Signed-off-by: Deepam02 <[email protected]>

codecov · 2025-10-04T05:05:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.37%. Comparing base (19a9f7b) to head (4fcc067).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2092      +/-   ##
==========================================
+ Coverage   48.31%   48.37%   +0.05%     
==========================================
  Files         238      238              
  Lines       30034    30067      +33     
==========================================
+ Hits        14512    14545      +33     
  Misses      14423    14423              
  Partials     1099     1099

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Deepam02 <[email protected]>

JAORMX · 2025-10-04T17:43:37Z

@claude please review this

claude · 2025-10-04T17:43:51Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

Copilot

Pull Request Overview

This PR adds tool error detection to telemetry middleware to track MCP (Model Context Protocol) tool execution errors through observability signals. When MCP tools fail, the system now captures this information in both traces and metrics.

Adds mcp.tool.error attribute to trace spans when tool execution errors are detected
Introduces tool_error status category in metrics for failed tool calls
Implements lightweight error detection by scanning response payloads for "isError":true patterns

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
pkg/telemetry/middleware.go	Core implementation of tool error detection logic and telemetry integration
pkg/telemetry/middleware_test.go	Unit tests for error detection function and response writer behavior
pkg/telemetry/integration_test.go	Integration test verifying end-to-end tool error detection in telemetry pipeline

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/telemetry/middleware.go

Signed-off-by: Deepam02 <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-04T19:43:09Z

pkg/telemetry/middleware.go

+	// Buffer response data for tool calls to enable proper error detection
+	if rw.isToolCall && !rw.hasToolError {
+		rw.responseBuffer = append(rw.responseBuffer, data...)
+	}


Unbounded memory usage: response buffer grows without limits for tool calls. Consider adding a maximum buffer size to prevent potential memory exhaustion on large responses.

Copilot · 2025-10-04T19:43:09Z

pkg/telemetry/middleware.go

+	if rw.isToolCall && !rw.hasToolError && len(rw.responseBuffer) > 0 {
+		rw.hasToolError = detectMCPToolError(rw.responseBuffer)
+		// Clear buffer to free memory
+		rw.responseBuffer = nil
+	}


Inefficient error detection: detectMCPToolError is called on the complete buffered response, which requires parsing the entire JSON even for large responses. Consider streaming JSON parsing or early detection when isError field is encountered.

ChrisJBurns · 2025-10-04T20:07:29Z

@claude try again and review this

claude · 2025-10-04T20:07:41Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

Add tool error detection to telemetry middleware

4b7ac63

Closes stacklok#2084 Signed-off-by: Deepam02 <[email protected]>

telemetry: fix test parallelism and formatting

51ecc34

Signed-off-by: Deepam02 <[email protected]>

JAORMX requested a review from Copilot October 4, 2025 17:43

Copilot AI reviewed Oct 4, 2025

View reviewed changes

pkg/telemetry/middleware.go Outdated Show resolved Hide resolved

pkg/telemetry/middleware.go Outdated Show resolved Hide resolved

Improve error detection with JSON parsing and response buffering

4fcc067

Signed-off-by: Deepam02 <[email protected]>

Deepam02 requested a review from Copilot October 4, 2025 19:42

Copilot AI reviewed Oct 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tool error detection to telemetry middleware #2092

Add tool error detection to telemetry middleware #2092

Deepam02 commented Oct 4, 2025

Uh oh!

codecov bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

JAORMX commented Oct 4, 2025

Uh oh!

claude bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 4, 2025

Uh oh!

Copilot AI Oct 4, 2025

Uh oh!

ChrisJBurns commented Oct 4, 2025

Uh oh!

claude bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add tool error detection to telemetry middleware #2092

Are you sure you want to change the base?

Add tool error detection to telemetry middleware #2092

Conversation

Deepam02 commented Oct 4, 2025

Uh oh!

codecov bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JAORMX commented Oct 4, 2025

Uh oh!

claude bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisJBurns commented Oct 4, 2025

Uh oh!

claude bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 4, 2025 •

edited

Loading

claude bot commented Oct 4, 2025 •

edited

Loading

claude bot commented Oct 4, 2025 •

edited

Loading