Improvements to the MCP tools service and enhances LLM API parameter handling#291
Improvements to the MCP tools service and enhances LLM API parameter handling#291locnguyen1986 merged 2 commits intoreleasefrom
Conversation
locnguyen1986
commented
Dec 9, 2025
- Adds retry, circuit breaker, and validation infrastructure to handle external API failures gracefully
- Extends LLM completion request handling to support provider-specific parameters (TopK, RepetitionPenalty)
- Removes/disables MCP features (DomainAllowList parameter, file search tools)
Adds retry, circuit breaker, and validation infrastructure to handle external API failures gracefully Extends LLM completion request handling to support provider-specific parameters (TopK, RepetitionPenalty) Removes/disables MCP features (DomainAllowList parameter, file search tools)
There was a problem hiding this comment.
Pull request overview
This PR enhances the reliability of external API calls in the MCP tools service and extends LLM API parameter handling to support provider-specific options. The changes add production-ready resilience patterns (retry, circuit breaker, validation) to handle transient failures gracefully while enabling richer LLM inference configuration through TopK and RepetitionPenalty parameters.
Key Changes:
- Implements exponential backoff retry, circuit breaker, and response validation infrastructure for Serper and SearXNG search APIs
- Extends LLM completion requests to support TopK and RepetitionPenalty parameters with model catalog defaults
- Removes DomainAllowList parameter and disables file search tools (file_search_index, file_search_query) from MCP service
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| services/mcp-tools/internal/interfaces/httpserver/routes/mcp/serper_mcp.go | Removes DomainAllowList field and comments out file search tools (index/query); removes unused fmt import |
| services/mcp-tools/internal/infrastructure/search/validation.go | Adds validation logic for search/fetch responses with enrichment for empty results; includes helpful error context |
| services/mcp-tools/internal/infrastructure/search/retry.go | Implements exponential backoff retry with configurable retryable errors and jitter to prevent thundering herd |
| services/mcp-tools/internal/infrastructure/search/circuit_breaker.go | Implements circuit breaker pattern with closed/open/half-open states for external API fault tolerance |
| services/mcp-tools/internal/infrastructure/search/client.go | Integrates retry and circuit breaker into search client for Serper and SearXNG calls; adds response validation |
| services/llm-api/internal/utils/httpclients/chat/chat_completion_client.go | Extends request type with TopK/RepetitionPenalty; adds detailed logging for request parameters |
| services/llm-api/internal/interfaces/httpserver/requests/chat/chat.go | Adds TopK and RepetitionPenalty fields to incoming chat completion request |
| services/llm-api/internal/interfaces/httpserver/handlers/modelhandler/provider_handler.go | Exposes GetModelCatalogByID method to retrieve catalog details for default parameter application |
| services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go | Applies model catalog defaults to LLM requests; converts decimal catalog values to appropriate types |
| services/llm-api/internal/domain/model/provider_model_service.go | Adds FindCatalogByID service method to support retrieving catalog for defaults |
| services/llm-api/internal/domain/model/model_catalog_service.go | Adds top_k and repetition_penalty to supported parameters list for model catalog |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Float32("repetition_penalty", repetitionPenalty). | ||
| Msg("[ChatCompletion][Stream] Sending request to inference server") | ||
|
|
||
| log.Info().Interface("request", request).Msg("[ChatCompletion][Stream] Request body") |
There was a problem hiding this comment.
Logging the entire request object at Info level (line 557) could expose sensitive data like API keys, user messages, or PII in logs. The request may contain conversation history with personal information. Consider using Debug level or sanitizing/redacting sensitive fields before logging.
| log.Info().Interface("request", request).Msg("[ChatCompletion][Stream] Request body") | |
| log.Debug().Interface("request", request).Msg("[ChatCompletion][Stream] Request body") |
| req.MaxTokens = val | ||
| } | ||
| } | ||
| if req.TopK == nil || (req.TopK != nil && *req.TopK == 0) { |
There was a problem hiding this comment.
The condition req.TopK == nil || (req.TopK != nil && *req.TopK == 0) is redundant. If req.TopK == nil is true, the second part of the OR will never be evaluated. Simplify to req.TopK == nil || *req.TopK == 0.
| if req.TopK == nil || (req.TopK != nil && *req.TopK == 0) { | |
| if req.TopK == nil || *req.TopK == 0 { |
| resp.SearchParameters = map[string]any{} | ||
| } | ||
| resp.SearchParameters["empty_result_reason"] = reason | ||
| resp.SearchParameters["has_results"] = len(resp.Organic) > 0 |
There was a problem hiding this comment.
The logic is inconsistent: has_results is set based on len(resp.Organic) > 0 after potentially populating resp.Organic with a fallback entry. This means has_results will always be true even when no actual results were found (only the enriched fallback exists). Consider checking if results were enriched or setting this flag before adding the fallback entry.
| } | ||
|
|
||
| // Add 10% jitter to prevent thundering herd | ||
| jitter := backoff * 0.1 * (2.0*float64(time.Now().UnixNano()%100)/100.0 - 1.0) |
There was a problem hiding this comment.
The jitter calculation may produce biased results. Using time.Now().UnixNano() % 100 for randomness doesn't provide uniform distribution across retries and can exhibit patterns. Consider using a proper random number generator (e.g., math/rand or crypto/rand) instead.
| req.TopK = &val | ||
| } | ||
| } | ||
| if req.RepetitionPenalty == nil || (req.RepetitionPenalty != nil && *req.RepetitionPenalty == 0) { |
There was a problem hiding this comment.
The condition req.RepetitionPenalty == nil || (req.RepetitionPenalty != nil && *req.RepetitionPenalty == 0) is redundant. If req.RepetitionPenalty == nil is true, the second part of the OR will never be evaluated. Simplify to req.RepetitionPenalty == nil || *req.RepetitionPenalty == 0.
| if req.RepetitionPenalty == nil || (req.RepetitionPenalty != nil && *req.RepetitionPenalty == 0) { | |
| if req.RepetitionPenalty == nil || *req.RepetitionPenalty == 0 { |
| if val == nil { | ||
| return 0, false | ||
| } | ||
| f, _ := val.Float64() |
There was a problem hiding this comment.
The error returned from Float64() conversion is being silently discarded with _. If the decimal-to-float conversion fails, this function will return 0 without indication. Consider checking and handling the conversion error, or at least document why it's safe to ignore.
| if cb.halfOpenCalls < cb.cfg.MaxHalfOpenCalls { | ||
| cb.halfOpenCalls++ |
There was a problem hiding this comment.
The halfOpenCalls counter is incremented in allowRequest() but only decremented in recordResult() on state transitions. If a request is allowed in half-open state but the result is never recorded (e.g., panic, context cancellation), the counter will not be decremented, potentially leaving the circuit breaker stuck. Consider using a defer pattern or separate increment/decrement tracking to ensure the counter is properly managed.
| resp.Organic = []map[string]any{ | ||
| { | ||
| "title": fmt.Sprintf("No results found for: %s", query), | ||
| "link": fmt.Sprintf("https://google.com/search?q=%s", strings.ReplaceAll(query, " ", "+")), |
There was a problem hiding this comment.
The query parameter in the Google search fallback URL is not properly URL-encoded. Using strings.ReplaceAll(query, " ", "+") only handles spaces but doesn't escape special characters like &, =, ?, etc. Use url.QueryEscape() from the net/url package for proper URL encoding.
| resp.Metadata = map[string]any{} | ||
| } | ||
| resp.Metadata["empty_result_reason"] = reason | ||
| resp.Metadata["has_content"] = strings.TrimSpace(resp.Text) != "" |
There was a problem hiding this comment.
The logic is inconsistent: has_content is set based on strings.TrimSpace(resp.Text) != "" after potentially setting resp.Text to a failure message. This means has_content will always be true even when no actual content was fetched (only the enriched error message exists). Consider checking if content was enriched or setting this flag before adding the error message.
| Float32("repetition_penalty", repetitionPenalty). | ||
| Msg("[ChatCompletion] Sending request to inference server") | ||
|
|
||
| log.Info().Interface("request", request).Msg("[ChatCompletion] Request body") |
There was a problem hiding this comment.
Logging the entire request object at Info level (line 169) could expose sensitive data like API keys, user messages, or PII in logs. The request may contain conversation history with personal information. Consider using Debug level or sanitizing/redacting sensitive fields before logging.
| log.Info().Interface("request", request).Msg("[ChatCompletion] Request body") | |
| log.Debug().Interface("request", request).Msg("[ChatCompletion] Request body") |