Simple Search Service

A lightweight, powerful full-text search service built with Rust, Tantivy, and SQLite. Think Elasticsearch/Solr but much simpler to deploy and operate.

Features

🚀 Fast: Built on Tantivy, Rust's answer to Lucene
💾 Simple Storage: Uses SQLite for metadata and Tantivy's built-in index storage
🔌 RESTful API: Easy integration with any application
🐳 Easy Deploy: Single binary or Docker container
🔍 Full-Text Search: BM25 ranking, phrase queries, fuzzy matching
🤖 Generative Answers: Mistral-powered, source-grounded responses (optional)
🌍 Multi-language: Supports Norwegian, English, and more
📊 Lightweight: Runs on 512MB RAM

Quick Start

Option 1: Docker (Recommended)

# Clone or extract the project
cd search-service

# Start with Docker Compose
docker-compose up -d

# Check health
curl http://localhost:3000/health

Docker Compose loads environment variables from .env (see env_file in docker-compose.yml).

Option 2: Build from Source

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Build
cargo build --release

# Run
./target/release/simple-search-service

The service will start on http://localhost:3000

API Documentation

Health Check

GET /health

Response:

{
  "status": "healthy",
  "service": "simple-search-service",
  "version": "0.1.0"
}

Create Index

POST /indices
Content-Type: application/json

{
  "name": "products",
  "fields": [
    {
      "name": "title",
      "field_type": "text",
      "stored": true,
      "indexed": true
    },
    {
      "name": "description",
      "field_type": "text",
      "stored": true,
      "indexed": true
    },
    {
      "name": "price",
      "field_type": "f64",
      "stored": true,
      "indexed": true
    }
  ]
}

Field types: text, string, i64, f64, date

For sorting and aggregations, set "fast": true on the field (required for date sorting).

List Indices

GET /indices

Response:

{
  "success": true,
  "data": [
    {
      "name": "products",
      "document_count": 1250,
      "created_at": "2025-01-16T10:30:00Z"
    }
  ]
}

Add Documents

POST /indices/products/documents
Content-Type: application/json

{
  "documents": [
    {
      "id": "prod_001",
      "fields": {
        "title": "Smil Barnehage Bergen",
        "description": "Modern barnehage i Bergen sentrum med fokus på læring gjennom lek",
        "price": 15000.0
      }
    },
    {
      "id": "prod_002",
      "fields": {
        "title": "Lekeland Barnehage",
        "description": "Familievennlig barnehage med store uteområder",
        "price": 12500.0
      }
    }
  ]
}

Search

POST /indices/products/search
Content-Type: application/json

{
  "query": "barnehage bergen",
  "limit": 10,
  "fields": ["title", "description"],
  "boost": {
    "title": 2.0
  },
  "fuzzy": true,
  "sort": {
    "field": "starts_at",
    "order": "desc"
  }
}

Response:

{
  "success": true,
  "data": {
    "took_ms": 2.4,
    "total": 2,
    "hits": [
      {
        "id": "prod_001",
        "score": 8.42,
        "fields": {
          "id": "prod_001",
          "title": "Smil Barnehage Bergen",
          "description": "Modern barnehage i Bergen sentrum...",
          "price": 15000.0
        }
      }
    ]
  }
}

Partial and fuzzy matching

Append an asterisk to any term (for example, "query": "eventyr*") to perform a prefix search that matches tokens beginning with that fragment.
Set "fuzzy": true in the search payload to tolerate a single-character typo (insertions, deletions, substitutions, or transpositions), which helps catch misspellings like evntyr.

Sorting by date

To sort by a date field, define the field as "field_type": "date" and set "fast": true when creating the index. Then pass the sort object in the search request:

{
  "query": "barnehage",
  "limit": 10,
  "sort": {
    "field": "starts_at",
    "order": "asc"
  }
}

Supported sort field types: i64, f64, date (must be fast: true).

Generative Answers (Mistral)

This endpoint runs a search, then asks Mistral to summarize the top hits into a grounded answer. If stream is true (default), the response is an SSE stream.

POST /indices/products/answer
Content-Type: application/json

{
  "query": "hvor er familievennlig barnehage",
  "search_limit": 5,
  "fields": ["title", "description", "location"],
  "fuzzy": true,
  "stream": false,
  "temperature": 0.2
}

Response (non-streaming):

{
  "success": true,
  "data": {
    "answer": "...",
    "model": "mistral-large-latest",
    "search_took_ms": 3.1,
    "llm_took_ms": 412.7,
    "total_took_ms": 418.5,
    "sources": [
      {
        "id": "kg_001",
        "score": 8.42,
        "fields": {
          "title": "Lekeland Barnehage",
          "description": "Familievennlig barnehage ..."
        }
      }
    ]
  }
}

Streaming (SSE) example:

curl -N http://localhost:3000/indices/kindergartens/answer \
  -H "Content-Type: application/json" \
  -d '{"query":"hvor er familievennlig barnehage","stream":true}'

The stream emits:

event: meta with JSON containing model, search_took_ms, and sources
data: chunks with partial answer text
event: done when finished

Delete Document

DELETE /indices/products/documents/prod_001

Delete Index

DELETE /indices/products

Bulk Operations

POST /indices/products/bulk
Content-Type: application/json

{
  "operations": [
    {
      "operation": "index",
      "document": {
        "id": "prod_003",
        "fields": {
          "title": "New Product",
          "description": "Description here"
        }
      }
    },
    {
      "operation": "delete",
      "id": "prod_001"
    }
  ]
}

Integration Examples

Laravel/PHP

use Illuminate\Support\Facades\Http;

// Create index
$response = Http::post('http://localhost:3000/indices', [
    'name' => 'kindergartens',
    'fields' => [
        ['name' => 'title', 'field_type' => 'text', 'stored' => true, 'indexed' => true],
        ['name' => 'description', 'field_type' => 'text', 'stored' => true, 'indexed' => true],
    ]
]);

// Add documents
$response = Http::post('http://localhost:3000/indices/kindergartens/documents', [
    'documents' => [
        [
            'id' => 'kg_001',
            'fields' => [
                'title' => 'Smil Barnehage',
                'description' => 'En flott barnehage i Bergen',
            ]
        ]
    ]
]);

// Search
$response = Http::post('http://localhost:3000/indices/kindergartens/search', [
    'query' => 'barnehage bergen',
    'limit' => 10
]);

$results = $response->json()['data'];

Exact match filter

curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{ "query": "collection_handle:my-collection", "limit": 10 }' | jq '.'

Combine with search terms

curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{ "query": "tariff AND collection_handle:my-collection", "limit": 10, "fuzzy": true }' | jq '.'

Multiple collections (using OR)

curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{ "query": "tariff AND (collection_handle:collection-a OR collection_handle:collection-b)", "limit": 10 }' | jq '.'

Multiple collections (using IN syntax - more efficient)

curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{ "query": "tariff AND collection_handle:IN[collection-a,collection-b,collection-c]", "limit": 10 }' | jq '.'

JavaScript/Node.js

// Add documents
const response = await fetch('http://localhost:3000/indices/products/documents', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    documents: [
      {
        id: 'prod_001',
        fields: {
          title: 'Product Name',
          description: 'Product description'
        }
      }
    ]
  })
});

// Search
const searchResponse = await fetch('http://localhost:3000/indices/products/search', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'search term',
    limit: 10
  })
});

const results = await searchResponse.json();

Configuration

Environment variables:

DATA_DIR: Data directory path (default: ./data)
PORT: Server port (default: 3000)
RUST_LOG: Log level (default: info, options: trace, debug, info, warn, error)
MISTRAL_API_KEY: API key for Mistral (enables /indices/:name/answer)
MISTRAL_MODEL: Mistral model name (default: mistral-large-latest)
MISTRAL_BASE_URL: Base URL for Mistral-compatible API (default: https://api.mistral.ai/v1)

.env is loaded automatically at startup (if present in the project root).

Performance Tips

Bulk Operations: Use bulk endpoints for adding multiple documents
Field Selection: Only store fields you need to display in results
Index Size: Expect index size to be 10-20% of original text
Memory: Allocate ~50MB per active index + buffer

Production Deployment

Systemd Service

Create /etc/systemd/system/search-service.service:

[Unit]
Description=Simple Search Service
After=network.target

[Service]
Type=simple
User=search
WorkingDirectory=/opt/search-service
Environment="DATA_DIR=/var/lib/search-service"
Environment="PORT=3000"
ExecStart=/opt/search-service/simple-search-service
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable search-service
sudo systemctl start search-service

Nginx Reverse Proxy

server {
    listen 80;
    server_name search.yourdomain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Monitoring

The service exposes a /health endpoint for health checks:

# Docker health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Backup

The data directory contains:

metadata.db: SQLite database with metadata
indices/: Directory with Tantivy index files

Simply backup the entire data directory:

# Backup
tar -czf search-backup-$(date +%Y%m%d).tar.gz data/

# Restore
tar -xzf search-backup-20250116.tar.gz

Use Cases

E-commerce: Product search with faceted filtering
Documentation: Technical documentation search
CRM: Customer and contact search
Content Management: Article and page search
Internal Tools: Log search, ticket search

Comparison with Elasticsearch

Feature	Simple Search Service	Elasticsearch
Memory	~512MB	~2GB minimum
Deployment	Single binary	JVM + cluster
Setup Time	< 1 minute	15-30 minutes
Cluster	No	Yes
Scaling	Vertical	Horizontal
Best For	Single server, <10M docs	Distributed, >10M docs

License

MIT License - feel free to use in commercial projects

Support

For issues or questions, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
docs		docs
src		src
target		target
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build-and-test.sh		build-and-test.sh
demo.sh		demo.sh
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
search-service.service		search-service.service

Folders and files

Latest commit

History

Repository files navigation

Simple Search Service

Features

Quick Start

Option 1: Docker (Recommended)

Option 2: Build from Source

API Documentation

Health Check

Create Index

List Indices

Add Documents

Search

Partial and fuzzy matching

Sorting by date

Generative Answers (Mistral)

Delete Document

Delete Index

Bulk Operations

Integration Examples

Laravel/PHP

Exact match filter

Combine with search terms

Multiple collections (using OR)

Multiple collections (using IN syntax - more efficient)

JavaScript/Node.js

Configuration

Performance Tips

Production Deployment

Systemd Service

Nginx Reverse Proxy

Monitoring

Backup

Use Cases

Comparison with Elasticsearch

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages