Skip to content

Conversation

@jtmcginty
Copy link

Summary

Adds Tool RAG (Retrieval-Augmented Generation) feature to intelligently filter tools using semantic search, dramatically improving performance for MCP clients with large tool sets (50+ tools).

Problem

When connecting to multiple MCP servers with 95+ tools, the client experiences severe performance degradation (3+ minutes per query) due to sending all tool schemas to the model.

Solution

Implements semantic search over tool schemas using sentence-transformers to send only relevant tools based on the user's query. Uses adaptive threshold-based filtering instead of fixed top_k for optimal results.

Performance Impact

  • Before: 203 seconds (3:23) per query with 95 tools
  • After: 30 seconds per query with Tool RAG enabled
  • Speedup: 6.8x improvement

Implementation Details

  • New ToolRAG class with embedding cache and semantic search
  • Threshold-based filtering (default: 0.65 similarity)
  • Configurable min/max tools (default: 0-20)
  • Feature flags: --enable-tool-rag, --tool-rag-threshold, etc.
  • Comprehensive unit tests (12 tests, all passing)
  • Zero impact when disabled (feature flag controlled)

Testing

bash
pytest tests/test_tool_rag.py -v

All 12 unit tests passing, including:

  • Embedding generation and caching
  • Semantic search accuracy
  • Threshold-based filtering
  • Edge cases (empty queries, no tools, etc.)

Dependencies

Adds sentence-transformers (~80MB model download on first use)

Breaking Changes

None - feature is opt-in via CLI flags

Add sentence-transformers library to enable semantic search and
intelligent tool filtering. This will be used to implement Tool RAG
(Retrieval-Augmented Generation) for efficient tool selection from
large tool sets.
Implement semantic search over tool schemas using sentence-transformers.
Features:
- Lazy model loading for efficiency
- Embedding cache with automatic invalidation
- Configurable model selection (default: all-MiniLM-L6-v2)
- Tool text representation combining name, description, and parameters
- Top-k retrieval using cosine similarity
Test coverage includes:
- Initialization and lazy loading
- Tool text representation
- Embedding generation and caching
- Semantic search accuracy for different query types
- Top-k retrieval behavior
- Cache management
- Error handling

All 12 tests passing with semantic search validation.
Add intelligent tool filtering using semantic search:
- New CLI flags: --enable-tool-rag and --tool-rag-top-k
- Automatic embedding of tools after server connection
- Query-time filtering to retrieve only relevant tools
- Fallback to all tools if RAG fails
- Respects user's enabled/disabled tool preferences

This dramatically improves performance with large tool sets (50+)
by reducing context size sent to the model.
Replace fixed top_k parameter with adaptive threshold-based filtering:
- Add --tool-rag-threshold (default 0.65): minimum similarity score
- Add --tool-rag-min-tools (default 0): fallback minimum
- Add --tool-rag-max-tools (default 20): performance cap

Benefits:
- Adaptive: sends 2-3 tools for focused queries, 10-15 for complex ones
- Configurable: users can tune threshold for their needs
- Efficient: only sends truly relevant tools, not fixed count

This should reduce response time from 30s to 10-15s for simple queries.
@jonigl jonigl self-assigned this Dec 10, 2025
@jonigl jonigl added enhancement New feature or request feature request pending to review I will review this as soon as possible labels Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature request pending to review I will review this as soon as possible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants