This guide walks you through setting up code-index for your project.
- Go 1.21+ — for building the CLI tool
- Node.js 20+ — for the MCP search server
- One of:
- AWS account with Bedrock access — best quality (Cohere embeddings + Claude summaries)
- OpenAI API key — good quality, easy setup
- Ollama — free, fully local, no cloud account needed
- Claude Code (optional) — the MCP tool integrates with Claude Code, but the CLI works standalone
- R — for parsing R source files (
brew install ron macOS,apt-get install r-baseon Ubuntu; falls back to regex if unavailable) - S3 bucket — for distributing the vector database across a team
- libsqlite3-dev — required on Linux for building from source
Pre-built binaries (recommended):
Download the latest release for your platform from GitHub Releases and add it to your PATH.
From source (requires Go 1.21+):
go install github.com/posit-dev/code-index/cmd/code-index@latestOr clone and build:
git clone https://github.com/posit-dev/code-index.git
cd code-index
go build -o code-index ./cmd/code-index/No separate installation needed — Claude Code runs it via npx:
{
"mcpServers": {
"code-index": {
"command": "npx",
"args": ["-y", "@jonyoder/code-index-mcp"]
}
}
}Create .code-index.json in your repository root. Start with the example:
cp .code-index.example.json .code-index.jsonEdit it to match your project structure. Choose one of the setups below.
Install Ollama, then pull the models:
ollama pull llama3.2
ollama pull nomic-embed-text{
"sources": [
{"path": "src", "language": "go"}
],
"llm": {
"provider": "openai",
"base_url": "http://localhost:11434/v1",
"function_model": "llama3.2",
"summary_model": "llama3.2"
},
"embeddings": {
"provider": "openai",
"base_url": "http://localhost:11434/v1",
"model": "nomic-embed-text"
}
}Set your API key: export OPENAI_API_KEY=sk-...
{
"sources": [
{"path": "src", "language": "go"}
],
"llm": {
"provider": "openai",
"function_model": "gpt-4o-mini",
"summary_model": "gpt-4o"
},
"embeddings": {
"provider": "openai",
"model": "text-embedding-3-small"
}
}Requires AWS credentials with Bedrock access. See AWS Setup.
{
"sources": [
{"path": "src", "language": "go"}
],
"llm": {
"provider": "bedrock",
"function_model": "us.anthropic.claude-haiku-4-5-20251001-v1:0",
"summary_model": "us.anthropic.claude-sonnet-4-6"
},
"embeddings": {
"provider": "bedrock",
"model": "cohere.embed-v4:0"
},
"aws": {
"region": "us-east-1"
}
}See Configuration for the full reference.
# 1. Parse source files (fast, no network calls)
code-index parse
# 2. Generate LLM summaries (requires Bedrock)
code-index generate
# 3. Build the searchable JSON index
code-index build
# 4. Create vector embeddings (requires Bedrock)
code-index embedcode-index allFor large codebases, you can limit how many items are processed:
code-index generate --limit 20 # Generate 20 file batches
code-index embed --limit 100 # Embed 100 itemsRun again without --limit to continue. The cache ensures already-done items are skipped.
code-index search "check if string is in slice"
code-index search "how does authentication work"
code-index search --max-results 20 "database transaction management"Once the MCP server is configured in .mcp.json, Claude Code uses code_search proactively. If your team distributes the vector database via S3 or HTTP URL, the MCP server downloads it automatically on first search.
For S3, the MCP server auto-detects AWS profiles from aws.profiles in .code-index.json — you just need to be logged in with aws sso login.
You can also ask Claude directly:
"Use code_search to find how authentication works in this project."
For programmatic use:
code-index search --json "authentication"- Configuration reference — all config options
- AWS Setup — Bedrock access and credentials
- CI Setup — automate nightly index updates