diff --git a/api_documentation_mcp_design.md b/api_documentation_mcp_design.md new file mode 100644 index 00000000..cd7f00b6 --- /dev/null +++ b/api_documentation_mcp_design.md @@ -0,0 +1,336 @@ +# API Documentation MCP Tool Design + +## Overview + +This design document outlines a new MCP tool for providing contextual API documentation through the Kodit server. The goal is to leverage the existing slicer infrastructure to extract public API methods and provide them to AI assistants with proper context and hierarchy information, enabling more accurate API usage guidance. + +## Problem Statement + +Currently, the Kodit MCP server provides a `search()` tool that returns code snippets based on user intent and keywords. However, for API documentation purposes, we need: + +1. **Structured API Discovery**: Extract public API methods with their class/package context +2. **Hierarchical Context**: Understand the relationship between classes, methods, and packages +3. **Related Method Resolution**: Find related methods based on API requests (e.g., "how to start a new model" should return initialization methods plus related configuration methods) +4. **Live/Dynamic Views**: Generate contextual API views based on specific requests rather than static documentation + +## Current Architecture Analysis + +### MCP Infrastructure +- **FastMCP Server**: Exposes tools via HTTP streaming, STDIO, or SSE +- **Current Tools**: `search()` and `get_version()` +- **Search Flow**: User intent → MultiSearchRequest → BM25/Embedding search → SnippetV2 results + +### Slicer Capabilities +- **Tree-sitter Based**: Parses multiple languages (Python, Java, Go, Rust, C/C++, JS, etc.) +- **Function Extraction**: Currently extracts functions/methods and builds call graphs +- **Dependency Analysis**: Tracks function calls and can build dependency trees +- **Snippet Generation**: Creates code snippets with dependencies and imports + +### Current Limitations for API Documentation +- **No Class Hierarchy Tracking**: Slicer focuses on functions, not class structures +- **No Visibility Analysis**: Doesn't distinguish between public/private methods +- **No Package/Module Context**: Limited understanding of namespace hierarchies +- **Function-Centric**: Not optimized for API surface area extraction + +## Proposed Solution + +### 1. Enhanced Slicer for API Extraction + +#### New API-Focused Analyzer +Create a new `APIAnalyzer` class that extends the current slicer capabilities: + +```python +@dataclass +class APIMethodInfo: + """Information about an API method.""" + name: str + qualified_name: str # e.g., "com.example.UserService.createUser" + visibility: Visibility # PUBLIC, PRIVATE, PROTECTED, PACKAGE + method_type: MethodType # INSTANCE, STATIC, CONSTRUCTOR + parameters: list[Parameter] + return_type: str | None + documentation: str | None + file_path: Path + line_number: int + class_context: ClassInfo | None + package_context: str | None + +@dataclass +class ClassInfo: + """Information about a class.""" + name: str + qualified_name: str + visibility: Visibility + superclass: str | None + interfaces: list[str] + methods: list[APIMethodInfo] + file_path: Path + package_context: str | None + +@dataclass +class APIHierarchy: + """Represents the API structure of a codebase.""" + packages: dict[str, PackageInfo] + classes: dict[str, ClassInfo] + public_methods: dict[str, APIMethodInfo] + method_relationships: dict[str, list[str]] # method -> related methods +``` + +#### Language-Specific API Extraction +Extend `LanguageConfig` to support API-specific node types: + +```python +"python": { + "function_nodes": ["function_definition"], + "method_nodes": ["function_definition"], # Within class_definition + "class_nodes": ["class_definition"], + "module_nodes": ["module"], + "visibility_indicators": ["def __", "def _"], # Private method patterns + # ... existing config +}, +"java": { + "function_nodes": ["method_declaration"], + "class_nodes": ["class_declaration", "interface_declaration"], + "package_nodes": ["package_declaration"], + "visibility_modifiers": ["public", "private", "protected"], + # ... existing config +} +``` + +### 2. New Domain Entities + +#### APISnippet Entity +Extend or create alongside `SnippetV2` for API-specific information: + +```python +class APISnippet(BaseModel): + """API-focused snippet with hierarchical context.""" + + # Core snippet info (similar to SnippetV2) + sha: str + content: str + extension: str + derives_from: list[GitFile] + + # API-specific metadata + api_method: APIMethodInfo + related_methods: list[APIMethodInfo] + usage_examples: list[str] = [] + hierarchical_context: APIContext + +@dataclass +class APIContext: + """Hierarchical context for an API method.""" + package_path: str | None + class_name: str | None + method_name: str + full_qualified_name: str + import_statements: list[str] + related_types: list[str] # Types used in parameters/return +``` + +### 3. Enhanced Search with API Context + +#### New MCP Tool: `api_documentation()` + +```python +@mcp_server.tool() +async def api_documentation( + ctx: Context, + api_request: Annotated[str, Field(description="What the user wants to accomplish with the API")], + specific_methods: Annotated[list[str], Field(description="Specific method names if known")] = [], + class_filter: Annotated[str | None, Field(description="Filter to specific class or package")] = None, + include_related: Annotated[bool, Field(description="Include related methods and examples")] = True, + language: Annotated[str | None, Field(description="Programming language filter")] = None, +) -> str: + """Get API documentation and usage examples for specific functionality. + + This tool provides structured API documentation with proper context, + including class hierarchies, method signatures, and usage examples. + """ +``` + +#### API Discovery Service +New application service for API-specific searches: + +```python +class APIDocumentationService: + """Service for API documentation and discovery.""" + + async def discover_api_methods( + self, + intent: str, + filters: APISearchFilters + ) -> list[APISnippet]: + """Discover relevant API methods based on user intent.""" + + async def get_method_context( + self, + method_name: str + ) -> APIContext: + """Get full hierarchical context for a method.""" + + async def find_related_methods( + self, + method: APIMethodInfo + ) -> list[APIMethodInfo]: + """Find methods related to the given method.""" +``` + +### 4. Live API View Generation + +#### Dynamic Context Assembly +Instead of pre-generating all possible API documentation snippets, generate them dynamically: + +1. **Query Analysis**: Parse user intent to identify: + - Target functionality (e.g., "start a new model") + - Preferred patterns (e.g., builder pattern, factory pattern) + - Context requirements (e.g., initialization, configuration) + +2. **Method Discovery**: Use enhanced search to find: + - Primary methods matching the intent + - Related initialization methods + - Configuration methods + - Error handling patterns + +3. **Context Assembly**: Build hierarchical view: + - Package/namespace imports needed + - Class instantiation patterns + - Method call sequences + - Parameter documentation + +4. **Related Method Resolution**: Identify related methods through: + - Call graph analysis (what methods call/are called by target) + - Parameter type analysis (methods using same types) + - Pattern analysis (methods following similar naming conventions) + - Documentation analysis (methods mentioned together in comments) + +### 5. Integration with Existing Infrastructure + +#### Extend Snippet Repository +Add API-specific repository methods: + +```python +class SnippetRepositoryV2(ABC): + # ... existing methods ... + + @abstractmethod + async def search_api_methods( + self, request: APISearchRequest + ) -> list[APISnippet]: + """Search for API methods with hierarchical context.""" + + @abstractmethod + async def get_method_hierarchy( + self, method_id: str + ) -> APIHierarchy: + """Get the full hierarchy context for a method.""" +``` + +#### Enrichment for API Context +Extend the enrichment system to include API-specific metadata: + +```python +class APIEnrichmentType(str, Enum): + """API-specific enrichment types.""" + + CLASS_HIERARCHY = "class_hierarchy" + METHOD_SIGNATURE = "method_signature" + USAGE_PATTERN = "usage_pattern" + RELATED_METHODS = "related_methods" + PARAMETER_DOCUMENTATION = "parameter_docs" +``` + +### 6. Implementation Strategy + +#### Phase 1: Enhanced Slicer +1. Extend `LanguageConfig` for class/visibility detection +2. Implement `APIAnalyzer` with class hierarchy extraction +3. Add visibility detection for common languages (Python, Java, Go) +4. Build API method index alongside function index + +#### Phase 2: API-Specific Domain Layer +1. Create `APIMethodInfo` and related domain entities +2. Implement `APISnippet` extending current snippet model +3. Add API-specific search and filtering capabilities +4. Create `APIDocumentationService` application service + +#### Phase 3: MCP Tool Integration +1. Implement `api_documentation()` MCP tool +2. Integrate with existing search infrastructure +3. Add API-specific result formatting +4. Implement dynamic context assembly + +#### Phase 4: Advanced Features +1. Related method discovery algorithms +2. Usage pattern detection +3. API change detection across commits +4. Integration examples generation + +## Questions and Considerations + +### Compatibility with Existing Snippets +**Question**: Should API snippets use the same `SnippetV2` infrastructure or be separate? + +**Recommendation**: Extend `SnippetV2` with optional API metadata rather than creating a separate system. This allows: +- Reuse of existing search infrastructure +- Unified storage and indexing +- Gradual migration path +- Compatibility with existing tools + +```python +class SnippetV2(BaseModel): + # ... existing fields ... + api_metadata: APIMetadata | None = None # Optional API-specific data + +class APIMetadata(BaseModel): + method_info: APIMethodInfo + hierarchical_context: APIContext + related_methods: list[str] # References to other snippet IDs +``` + +### Live vs. Static Generation +**Question**: Should API documentation be pre-generated or dynamically assembled? + +**Recommendation**: Hybrid approach: +- **Static Indexing**: Pre-extract and index API methods, classes, and basic relationships during commit processing +- **Dynamic Assembly**: Generate contextual views and related method collections at query time +- **Caching**: Cache frequently requested API contexts to improve performance + +### Handling Multiple Languages +**Question**: How to handle different API patterns across languages? + +**Recommendation**: Language-specific strategies within a common framework: +- **Python**: Class methods, module functions, `__init__` patterns +- **Java**: Class methods, interfaces, package structure, annotations +- **Go**: Package functions, struct methods, interface implementations +- **JavaScript**: Class methods, module exports, prototype methods + +### Related Method Discovery +**Question**: How to effectively find related methods? + +**Recommendation**: Multi-factor approach: +1. **Structural Relationships**: Same class, inheritance hierarchy, interface implementations +2. **Call Graph Analysis**: Methods that call each other or are commonly called together +3. **Type Relationships**: Methods using same parameter/return types +4. **Naming Patterns**: Methods with similar prefixes/suffixes (get/set, create/delete) +5. **Documentation Analysis**: Methods mentioned together in comments/docstrings +6. **Usage Patterns**: Methods frequently used together in examples + +## Success Metrics + +1. **API Coverage**: Percentage of public API methods successfully extracted and indexed +2. **Context Accuracy**: Quality of hierarchical context (package, class, method relationships) +3. **Related Method Relevance**: Accuracy of related method suggestions +4. **Response Time**: Performance of dynamic API context assembly +5. **User Satisfaction**: Effectiveness for AI assistant API guidance tasks + +## Next Steps + +1. **Prototype Phase**: Build a minimal API analyzer for Python to validate the approach +2. **Integration Testing**: Test with existing MCP infrastructure +3. **Evaluation**: Compare API documentation quality with current search results +4. **Iteration**: Refine based on real-world usage patterns +5. **Multi-language Support**: Extend to Java, Go, and other priority languages + +This design leverages Kodit's existing strengths in code analysis and search while adding the structured API discovery capabilities needed for effective API documentation and guidance. \ No newline at end of file diff --git a/slicer_public_api_plan.md b/slicer_public_api_plan.md new file mode 100644 index 00000000..5b104d42 --- /dev/null +++ b/slicer_public_api_plan.md @@ -0,0 +1,337 @@ +# Slicer Public API Enhancement Plan + +## Goal +Extend the existing slicer to extract only public APIs with proper context (class, module, package) while maintaining backward compatibility. + +## Current Slicer Analysis + +**Existing Structure:** +- `FunctionInfo`: Contains file, node, span, qualified_name +- `LanguageConfig`: Language-specific node types and patterns +- `AnalyzerState`: Tracks functions, call graphs, imports +- `extract_snippets_from_git_files()`: Main extraction method + +**Current Behavior:** +- Extracts ALL functions/methods from files +- Builds call dependency graphs +- Generates SnippetV2 for each function +- Supports 10+ programming languages + +## Enhancement Plan + +### 1. Extend FunctionInfo Structure + +```python +@dataclass +class FunctionInfo: + """Information about a function definition.""" + + # Existing fields + file: Path + node: Node + span: tuple[int, int] + qualified_name: str + + # New fields for public API support + visibility: Visibility + context: CodeContext + is_public_api: bool + +@dataclass +class CodeContext: + """Context information for a function.""" + + class_name: str | None = None + module_name: str | None = None + package_name: str | None = None + namespace: str | None = None + +class Visibility(str, Enum): + """Code visibility levels.""" + + PUBLIC = "public" + PRIVATE = "private" + PROTECTED = "protected" + PACKAGE = "package" # Java package-private + INTERNAL = "internal" # Some languages + UNKNOWN = "unknown" +``` + +### 2. Extend LanguageConfig for Public API Detection + +Add public API detection rules to each language configuration: + +```python +CONFIGS: ClassVar[dict[str, dict[str, Any]]] = { + "python": { + # Existing config... + "function_nodes": ["function_definition"], + "class_nodes": ["class_definition"], + "module_nodes": ["module"], + + # New public API rules + "public_patterns": { + "function": lambda name: not name.startswith("_"), + "class": lambda name: not name.startswith("_"), + "method": lambda name: not name.startswith("_") + }, + "context_extractors": { + "class": "class_definition", + "module": "module" + } + }, + "java": { + # Existing config... + "function_nodes": ["method_declaration"], + "class_nodes": ["class_declaration", "interface_declaration"], + "package_nodes": ["package_declaration"], + + # New public API rules + "visibility_modifiers": ["public", "private", "protected"], + "public_indicators": ["public"], + "context_extractors": { + "class": "class_declaration", + "package": "package_declaration" + } + }, + "go": { + # Existing config... + "function_nodes": ["function_declaration"], + "method_nodes": ["method_declaration"], + "type_nodes": ["type_declaration"], + + # New public API rules + "public_patterns": { + "function": lambda name: name[0].isupper(), + "type": lambda name: name[0].isupper(), + "method": lambda name: name[0].isupper() + }, + "context_extractors": { + "package": "package_clause", + "receiver": "parameter_list" # For methods + } + }, + "javascript": { + # Existing config... + "function_nodes": ["function_declaration", "method_definition"], + "class_nodes": ["class_declaration"], + "export_nodes": ["export_statement"], + + # New public API rules + "public_indicators": ["export"], + "context_extractors": { + "class": "class_declaration", + "module": "program" + } + } +} +``` + +### 3. Add Context Extraction Methods + +New methods in the Slicer class: + +```python +class Slicer: + # ... existing methods ... + + def _extract_code_context( + self, + node: Node, + file_path: Path, + language: str + ) -> CodeContext: + """Extract class/module/package context for a node.""" + + def _determine_visibility( + self, + node: Node, + name: str, + language: str + ) -> Visibility: + """Determine if a function/method is public based on language rules.""" + + def _is_public_api( + self, + function_info: FunctionInfo, + language: str + ) -> bool: + """Determine if a function represents a public API.""" + + def _find_parent_class(self, node: Node) -> Node | None: + """Find the parent class node if the function is a method.""" + + def _extract_package_info(self, tree: Tree, language: str) -> str | None: + """Extract package/module information from the file.""" +``` + +### 4. Update Main Extraction Logic + +Modify `extract_snippets_from_git_files()` to filter for public APIs: + +```python +def extract_snippets_from_git_files( + self, + files: list[GitFile], + language: str = "python", + public_apis_only: bool = False # New parameter +) -> list[SnippetV2]: + """Extract code snippets from files. + + Args: + files: List of files to analyze + language: Programming language + public_apis_only: If True, only extract public API methods + """ + + # ... existing parsing logic ... + + # Build enhanced function index with context + self._build_enhanced_definition_index(state, config, language) + + # Filter for public APIs if requested + if public_apis_only: + public_functions = { + name: info for name, info in state.def_index.items() + if info.is_public_api + } + functions_to_process = public_functions + else: + functions_to_process = state.def_index + + # Generate snippets only for selected functions + snippets: list[SnippetV2] = [] + for qualified_name in functions_to_process: + # ... existing snippet generation ... +``` + +### 5. Enhanced Function Index Building + +Replace `_build_definition_and_import_indexes()` with enhanced version: + +```python +def _build_enhanced_definition_index( + self, + state: AnalyzerState, + config: dict[str, Any], + language: str +) -> None: + """Build function definition index with context and visibility.""" + + for file_path, tree in state.asts.items(): + # Extract package/module context for the entire file + file_context = self._extract_file_context(tree, language) + + for node in self._walk_tree(tree.root_node): + if self._is_function_definition(node, config): + func_name = self._extract_function_name(node, config, language) + if func_name: + # Extract context (class, module, etc.) + context = self._extract_code_context(node, file_path, language) + context.module_name = file_context.get("module") + context.package_name = file_context.get("package") + + # Determine visibility + visibility = self._determine_visibility(node, func_name, language) + + # Create enhanced function info + qualified_name = self._build_qualified_name(func_name, context) + function_info = FunctionInfo( + file=file_path, + node=node, + span=(node.start_byte, node.end_byte), + qualified_name=qualified_name, + visibility=visibility, + context=context, + is_public_api=self._is_public_api_by_rules( + func_name, visibility, context, language + ) + ) + + state.def_index[qualified_name] = function_info +``` + +### 6. Language-Specific Public API Rules + +#### Python +- **Public Functions**: Don't start with `_` +- **Public Classes**: Don't start with `_` +- **Public Methods**: Don't start with `_`, not `__init__` or `__special__` +- **Context**: Module name from file path, class name from parent class_definition + +#### Java +- **Public Methods**: Have `public` modifier +- **Public Classes**: Have `public` modifier +- **Context**: Package from package_declaration, class from class_declaration + +#### Go +- **Public Functions**: Start with uppercase letter +- **Public Types**: Start with uppercase letter +- **Public Methods**: Start with uppercase letter +- **Context**: Package from package clause, receiver type for methods + +#### JavaScript/TypeScript +- **Public Functions**: Exported functions +- **Public Classes**: Exported classes +- **Public Methods**: Methods on exported classes +- **Context**: Module from file, class from class_declaration + +### 7. New MCP Integration + +Add public API filtering to the MCP search tool: + +```python +@mcp_server.tool() +async def search( + # ... existing parameters ... + public_apis_only: Annotated[ + bool, + Field(description="Only return public API methods") + ] = False, +) -> str: + """Search for code examples, optionally filtered to public APIs only.""" +``` + +### 8. Implementation Steps + +1. **Step 1**: Extend `FunctionInfo` and add new dataclasses +2. **Step 2**: Add public API detection rules to `LanguageConfig` +3. **Step 3**: Implement context extraction methods +4. **Step 4**: Implement visibility determination logic +5. **Step 5**: Update main extraction logic with public API filtering +6. **Step 6**: Add tests with existing test data +7. **Step 7**: Update MCP tool to support public API filtering + +### 9. Backward Compatibility + +- Keep `public_apis_only=False` as default to maintain existing behavior +- All new fields in `FunctionInfo` have defaults to avoid breaking changes +- Existing snippet generation continues to work unchanged + +### 10. Testing Strategy + +Use existing test data in `tests/kodit/infrastructure/slicing/data/`: +- **Python**: Test underscore prefix detection +- **Java**: Test public modifier detection +- **Go**: Test capitalization rules +- **JavaScript**: Test export detection + +Expected public APIs from test data: +```python +# Python (main.py) +- create_sample_products() # Public function +- demonstrate_cart() # Public function +- demonstrate_points() # Public function +- main() # Public function + +# Java (Main.java) +- public static void main() # Public method +- public methods in classes # Public methods + +# Go (main.go) +- createSampleProducts() # Public (capitalized) +- demonstrateCart() # Public (capitalized) +- main() # Public (capitalized) +``` + +This approach extends the existing slicer architecture incrementally while adding the public API extraction capability you need for the MCP tool. \ No newline at end of file