feat: replace Algolia search with self-contained llms.txt search#28
feat: replace Algolia search with self-contained llms.txt search#28mattpodwysocki wants to merge 6 commits intomainfrom
Conversation
zmofei
left a comment
There was a problem hiding this comment.
LGTM — good call removing the shared Algolia quota dependency.
Two minor notes (neither blocking):
-
Merge coordination with #27 — both PRs modify
docFetcher.ts, the 5 resource classes, anddocFetcher.test.ts. This PR'sfetchCachedTextuses the more correct!== nullcache check (handles empty-string entries) vs #27's falsy check — make sure the better version survives the rebase. -
fetchCachedTextreads full body before size check — PR #27's version usesreadBodyWithLimit()which caps reads early. Not a concern for llms.txt files (< 50KB each), but worth noting if this helper gets reused for larger URLs.
Search engine is clean — scoring weights, dedup, and graceful degradation on failed sources are all well-handled. Good test coverage across parsing, ranking, and resilience.
|
On the merge coordination — the |
search_mapbox_docs_tool no longer depends on the Algolia third-party service. The hosted server shares a single Algolia free-tier quota across all users, making it prone to throttling as usage grows. The new implementation fetches 12 product llms.txt files in parallel on first search (~220KB total), caches them via docCache (1-hour TTL), and does in-memory keyword matching on subsequent calls — no network calls after warm-up, no quota, no third-party dependency. Products indexed: API Reference, GL JS, Help Center, Style Spec, Studio Manual, Search JS, Maps SDK iOS/Android, Navigation SDK iOS/Android, Mapbox Tiling Service, Tilesets. Scoring: title matches (3x) > description matches (1x) = URL path (1x). Results are deduplicated by URL across sources and capped at limit. Failed sources are skipped gracefully. Also adds fetchCachedText() to docFetcher.ts as a shared helper for fetching and caching plain-text URLs (used by both the search index and the resource layer). Fixes a subtle bug where empty-string cache entries were not recognized as hits due to falsy check on ''. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docs.mapbox.com restructured so that llms.txt now exists at every product level (e.g. /api/llms.txt, /help/llms.txt, /mapbox-gl-js/llms.txt) and llms-full.txt contains full page content. The root llms.txt is now a pure link index, so resources that filtered it by category keyword were returning empty content. - resource://mapbox-api-reference → docs.mapbox.com/api/llms.txt (structured index of all REST APIs by service category) - resource://mapbox-guides → docs.mapbox.com/help/llms.txt (39KB Help Center index with actual guide content) - resource://mapbox-sdk-docs → docs.mapbox.com/mapbox-gl-js/llms.txt (34KB GL JS documentation index with guides, API ref, examples) - resource://mapbox-reference → root llms.txt without filtering (complete product catalog for discovering available docs) - resource://mapbox-examples → continues filtering root for playground/demo sections Add fetchCachedText() shared helper to docFetcher to consolidate the repeated fetch+cache pattern across all resource implementations. Fix toMarkdownUrl() to return null for URLs already ending in .txt/.md/.json so get_document_tool fetches llms.txt files directly without a wasted .md rewrite attempt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cspell flagged capitalized 'Isochrone' as an unknown word in two resource description strings. Lowercased to fix CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cspell doesn't know 'isochrone' by default. Adding both cases to the project word list so the CI spellcheck passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
.claude/worktrees/exciting-sutherland was accidentally staged during the merge commit. Removed and added .claude/ to .gitignore to prevent recurrence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
614edaf to
cb96bdb
Compare
Summary
search_mapbox_docs_toolno longer depends on the Algolia third-party service. The hosted server atmcp-docs.mapbox.comshares a single Algolia free-tier quota across every user, making it prone to throttling as adoption grows. This replaces it with a self-contained keyword search over thellms.txtindex files that now exist at every product level on docs.mapbox.com (following the docs restructure in subdomain-docs-root#576).How it works
On first search, 12 product
llms.txtfiles are fetched in parallel (~220KB total). Each file contains a structured list of documentation pages with titles, URLs, and one-line descriptions. These are cached via the existingdocCachefor the 1-hour TTL. All subsequent searches are pure in-memory keyword matching — zero network calls after warm-up.Products indexed (~220KB total, all cached):
api/llms.txtmapbox-gl-js/llms.txthelp/llms.txtstyle-spec/llms.txtstudio-manual/llms.txtmapbox-search-js/llms.txtios/maps/llms.txtandroid/maps/llms.txtios/navigation/llms.txtandroid/navigation/llms.txtmapbox-tiling-service/llms.txtdata/tilesets/llms.txtScoring: Title matches rank 3×, description and URL path matches 1× each. Results are deduplicated by URL across sources (same page can appear in multiple product indexes) and capped at
limit.Reliability: Failed sources are silently skipped — if any single
llms.txtis unreachable, the remaining 11 still return results. No single point of failure.Trade-offs vs Algolia
For typical developer queries ("directions API parameters", "add marker GL JS", "style spec fill color") keyword matching against structured titles and descriptions is effective. The output format is the same shape — numbered list with title, URL, and description — so
get_document_toolchaining works identically.Also included
fetchCachedText(url, httpRequest)— new helper indocFetcher.tsfor fetching and caching plain-text URLs. Shared betweendocsSearchIndex.tsand (in PR #27) the resource layer. Fixes a subtle bug where empty-string cache entries weren't recognized as hits due to a falsy check on''— now uses!== null.Test plan
npm test— 68 tests pass (including 8 new SearchDocsTool tests, 7 new docsSearchIndex tests)"Search Mapbox docs for isochrone"
🤖 Generated with Claude Code