A lightweight, powerful full-text search service built with Rust, Tantivy, and SQLite. Think Elasticsearch/Solr but much simpler to deploy and operate.
- 🚀 Fast: Built on Tantivy, Rust's answer to Lucene
- 💾 Simple Storage: Uses SQLite for metadata and Tantivy's built-in index storage
- 🔌 RESTful API: Easy integration with any application
- 🐳 Easy Deploy: Single binary or Docker container
- 🔍 Full-Text Search: BM25 ranking, phrase queries, fuzzy matching
- 🤖 Generative Answers: Mistral-powered, source-grounded responses (optional)
- 🌍 Multi-language: Supports Norwegian, English, and more
- 📊 Lightweight: Runs on 512MB RAM
# Clone or extract the project
cd search-service
# Start with Docker Compose
docker-compose up -d
# Check health
curl http://localhost:3000/healthDocker Compose loads environment variables from .env (see env_file in docker-compose.yml).
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Build
cargo build --release
# Run
./target/release/simple-search-serviceThe service will start on http://localhost:3000
GET /healthResponse:
{
"status": "healthy",
"service": "simple-search-service",
"version": "0.1.0"
}POST /indices
Content-Type: application/json
{
"name": "products",
"fields": [
{
"name": "title",
"field_type": "text",
"stored": true,
"indexed": true
},
{
"name": "description",
"field_type": "text",
"stored": true,
"indexed": true
},
{
"name": "price",
"field_type": "f64",
"stored": true,
"indexed": true
}
]
}Field types: text, string, i64, f64, date
For sorting and aggregations, set "fast": true on the field (required for date sorting).
GET /indicesResponse:
{
"success": true,
"data": [
{
"name": "products",
"document_count": 1250,
"created_at": "2025-01-16T10:30:00Z"
}
]
}POST /indices/products/documents
Content-Type: application/json
{
"documents": [
{
"id": "prod_001",
"fields": {
"title": "Smil Barnehage Bergen",
"description": "Modern barnehage i Bergen sentrum med fokus på læring gjennom lek",
"price": 15000.0
}
},
{
"id": "prod_002",
"fields": {
"title": "Lekeland Barnehage",
"description": "Familievennlig barnehage med store uteområder",
"price": 12500.0
}
}
]
}POST /indices/products/search
Content-Type: application/json
{
"query": "barnehage bergen",
"limit": 10,
"fields": ["title", "description"],
"boost": {
"title": 2.0
},
"fuzzy": true,
"sort": {
"field": "starts_at",
"order": "desc"
}
}Response:
{
"success": true,
"data": {
"took_ms": 2.4,
"total": 2,
"hits": [
{
"id": "prod_001",
"score": 8.42,
"fields": {
"id": "prod_001",
"title": "Smil Barnehage Bergen",
"description": "Modern barnehage i Bergen sentrum...",
"price": 15000.0
}
}
]
}
}- Append an asterisk to any term (for example,
"query": "eventyr*") to perform a prefix search that matches tokens beginning with that fragment. - Set
"fuzzy": truein the search payload to tolerate a single-character typo (insertions, deletions, substitutions, or transpositions), which helps catch misspellings likeevntyr.
To sort by a date field, define the field as "field_type": "date" and set "fast": true when creating the index. Then pass the sort object in the search request:
{
"query": "barnehage",
"limit": 10,
"sort": {
"field": "starts_at",
"order": "asc"
}
}Supported sort field types: i64, f64, date (must be fast: true).
This endpoint runs a search, then asks Mistral to summarize the top hits into a grounded answer.
If stream is true (default), the response is an SSE stream.
POST /indices/products/answer
Content-Type: application/json
{
"query": "hvor er familievennlig barnehage",
"search_limit": 5,
"fields": ["title", "description", "location"],
"fuzzy": true,
"stream": false,
"temperature": 0.2
}Response (non-streaming):
{
"success": true,
"data": {
"answer": "...",
"model": "mistral-large-latest",
"search_took_ms": 3.1,
"llm_took_ms": 412.7,
"total_took_ms": 418.5,
"sources": [
{
"id": "kg_001",
"score": 8.42,
"fields": {
"title": "Lekeland Barnehage",
"description": "Familievennlig barnehage ..."
}
}
]
}
}Streaming (SSE) example:
curl -N http://localhost:3000/indices/kindergartens/answer \
-H "Content-Type: application/json" \
-d '{"query":"hvor er familievennlig barnehage","stream":true}'The stream emits:
event: metawith JSON containingmodel,search_took_ms, andsourcesdata:chunks with partial answer textevent: donewhen finished
DELETE /indices/products/documents/prod_001DELETE /indices/productsPOST /indices/products/bulk
Content-Type: application/json
{
"operations": [
{
"operation": "index",
"document": {
"id": "prod_003",
"fields": {
"title": "New Product",
"description": "Description here"
}
}
},
{
"operation": "delete",
"id": "prod_001"
}
]
}use Illuminate\Support\Facades\Http;
// Create index
$response = Http::post('http://localhost:3000/indices', [
'name' => 'kindergartens',
'fields' => [
['name' => 'title', 'field_type' => 'text', 'stored' => true, 'indexed' => true],
['name' => 'description', 'field_type' => 'text', 'stored' => true, 'indexed' => true],
]
]);
// Add documents
$response = Http::post('http://localhost:3000/indices/kindergartens/documents', [
'documents' => [
[
'id' => 'kg_001',
'fields' => [
'title' => 'Smil Barnehage',
'description' => 'En flott barnehage i Bergen',
]
]
]
]);
// Search
$response = Http::post('http://localhost:3000/indices/kindergartens/search', [
'query' => 'barnehage bergen',
'limit' => 10
]);
$results = $response->json()['data'];curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{
"query": "collection_handle:my-collection",
"limit": 10
}' | jq '.'
curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{
"query": "tariff AND collection_handle:my-collection",
"limit": 10,
"fuzzy": true
}' | jq '.'
curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{
"query": "tariff AND (collection_handle:collection-a OR collection_handle:collection-b)",
"limit": 10
}' | jq '.'
curl -X POST http://localhost:3000/indices/myindex/search
-H 'Content-Type: application/json'
-d '{
"query": "tariff AND collection_handle:IN[collection-a,collection-b,collection-c]",
"limit": 10
}' | jq '.'
// Add documents
const response = await fetch('http://localhost:3000/indices/products/documents', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
documents: [
{
id: 'prod_001',
fields: {
title: 'Product Name',
description: 'Product description'
}
}
]
})
});
// Search
const searchResponse = await fetch('http://localhost:3000/indices/products/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'search term',
limit: 10
})
});
const results = await searchResponse.json();Environment variables:
DATA_DIR: Data directory path (default:./data)PORT: Server port (default:3000)RUST_LOG: Log level (default:info, options:trace,debug,info,warn,error)MISTRAL_API_KEY: API key for Mistral (enables/indices/:name/answer)MISTRAL_MODEL: Mistral model name (default:mistral-large-latest)MISTRAL_BASE_URL: Base URL for Mistral-compatible API (default:https://api.mistral.ai/v1)
.env is loaded automatically at startup (if present in the project root).
- Bulk Operations: Use bulk endpoints for adding multiple documents
- Field Selection: Only store fields you need to display in results
- Index Size: Expect index size to be 10-20% of original text
- Memory: Allocate ~50MB per active index + buffer
Create /etc/systemd/system/search-service.service:
[Unit]
Description=Simple Search Service
After=network.target
[Service]
Type=simple
User=search
WorkingDirectory=/opt/search-service
Environment="DATA_DIR=/var/lib/search-service"
Environment="PORT=3000"
ExecStart=/opt/search-service/simple-search-service
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetsudo systemctl daemon-reload
sudo systemctl enable search-service
sudo systemctl start search-serviceserver {
listen 80;
server_name search.yourdomain.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}The service exposes a /health endpoint for health checks:
# Docker health check
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1The data directory contains:
metadata.db: SQLite database with metadataindices/: Directory with Tantivy index files
Simply backup the entire data directory:
# Backup
tar -czf search-backup-$(date +%Y%m%d).tar.gz data/
# Restore
tar -xzf search-backup-20250116.tar.gz- E-commerce: Product search with faceted filtering
- Documentation: Technical documentation search
- CRM: Customer and contact search
- Content Management: Article and page search
- Internal Tools: Log search, ticket search
| Feature | Simple Search Service | Elasticsearch |
|---|---|---|
| Memory | ~512MB | ~2GB minimum |
| Deployment | Single binary | JVM + cluster |
| Setup Time | < 1 minute | 15-30 minutes |
| Cluster | No | Yes |
| Scaling | Vertical | Horizontal |
| Best For | Single server, <10M docs | Distributed, >10M docs |
MIT License - feel free to use in commercial projects
For issues or questions, please open an issue on the GitHub repository.