Performance Guide & Best Practices

Comprehensive guide to optimizing SemanticCache performance and following best practices.

Performance Characteristics
Backend Selection
Optimization Strategies
Async vs Sync Operations
Best Practices
Common Pitfalls
Benchmarking

Performance Characteristics

Operation Complexity

In-Memory Backends (LRU, LFU, FIFO)

Operation	Time Complexity	Space Complexity	Notes
`Set`	O(1)	O(1)	Hash map insert
`Get`	O(1)	O(1)	Hash map lookup
`Delete`	O(1) - O(n)	O(1)	O(n) for FIFO queue removal
`Contains`	O(1)	O(1)	Hash map check
`Lookup`	O(n)	O(1)	Linear scan of all entries
`TopMatches`	O(n log n)	O(n)	Compute all scores, then sort
`GetEmbedding`	O(1)	O(1)	Separate index lookup
`Keys`	O(n)	O(n)	Return all keys
`Flush`	O(n)	O(1)	Clear all entries

Key Insight: Semantic search (Lookup, TopMatches) is O(n) - scales linearly with cache size.

Redis Backend

Operation	Time Complexity	Notes
`Set`	O(1) + network	JSON.SET
`Get`	O(1) + network	JSON.GET
`Delete`	O(1) + network	DEL command
`Lookup`	O(n) + network	Could be O(log n) with FT.SEARCH
`GetBatchAsync`	O(n) + 1 RTT	Pipelining (single round-trip)
`Flush`	O(n) + network	SCAN + DEL

Key Insight: Network latency dominates. Use batch operations to minimize round-trips.

Memory Usage

Per Entry Overhead:

In-Memory (LRU):
  Entry struct:        16 bytes (pointer to embedding, value)
  Embedding:           dimensions × 4 bytes (float32)
  Value:               size of V
  Map overhead:        ~48 bytes per entry (Go map internals)
  Index map:           8 bytes (pointer) per key
  Total:               ~72 bytes + embedding_size + value_size

Example (text-embedding-3-small, string value):
  Embedding:           1536 × 4 = 6,144 bytes
  Value (string):      ~64 bytes (average)
  Overhead:            ~72 bytes
  Total per entry:     ~6,280 bytes

  10,000 entries:      ~60 MB
  100,000 entries:     ~600 MB
  1,000,000 entries:   ~6 GB

Redis:

Per entry:
  JSON document:       ~embedding_size + value_size + metadata
  Vector index:        Additional ~10-20% overhead (HNSW)

Network:
  Bandwidth:           ~6 KB per Set operation (1536-dim embedding)

Backend Selection

When to Use LRU (Least Recently Used)

Best For:

Time-based access patterns
Recent queries more important
Working set fits in memory
Single-application caching

Example Use Cases:

User session data
Recent search queries
Active conversation history

Configuration:

cache, err := semanticcache.New[string, Response](
    options.WithLRUBackend(10000),  // Keep 10k most recent
    options.WithOpenAIProvider("key"),
)

Performance:

Set: O(1)
Get: O(1), updates recency
Best when: Access patterns show temporal locality

When to Use LFU (Least Frequently Used)

Best For:

Frequency-based access patterns
Popular items stay cached
Long-running applications
Skewed access distribution

Example Use Cases:

FAQ systems (popular questions)
Product catalog (hot items)
API response caching (common endpoints)

Configuration:

cache, err := semanticcache.New[string, string](
    options.WithLFUBackend(5000),  // Keep 5k most frequent
    options.WithOpenAIProvider("key"),
)

Performance:

Set: O(1), tracks frequency
Get: O(1), increments frequency
Best when: 80/20 rule applies (20% of items = 80% of accesses)

When to Use FIFO (First In, First Out)

Best For:

Simple eviction policy
Data has inherent ordering
Predictable access patterns
Low-overhead caching

Example Use Cases:

Event stream caching
Log data
Time-series data

Configuration:

cache, err := semanticcache.New[string, LogEntry](
    options.WithFIFOBackend(1000),
    options.WithOpenAIProvider("key"),
)

Performance:

Set: O(1)
Get: O(1)
Delete: O(n) - queue removal
Best when: Simple eviction is acceptable

When to Use Redis

Best For:

Shared cache across multiple instances
Persistence required
Large datasets (> memory)
Distributed systems

Example Use Cases:

Multi-server deployments
Microservices sharing cache
Production systems requiring durability
Caches > 100GB

Configuration:

cache, err := semanticcache.New[string, MyStruct](
    options.WithRedisBackend("localhost:6379", 0),
    options.WithOpenAIProvider("key"),
)

Performance:

Network latency: 1-5ms (local), 10-50ms (cloud)
Throughput: Limited by network, not computation
Best when: Sharing cache is more important than latency

Optimization:

Use GetBatchAsync for parallel retrieval
Enable Redis pipelining (automatic)
Consider Redis Cluster for scale

Optimization Strategies

1. Choose Right Similarity Function

Performance Comparison (1536-dim vectors):

Function	Complexity	Speed	Use Case
Cosine	O(n)	~1 μs	General purpose (default)
Euclidean	O(n)	~0.8 μs	Absolute position matters
Dot Product	O(n)	~0.5 μs	Pre-normalized vectors
Manhattan	O(n)	~0.7 μs	High dimensions, outliers
Pearson	O(n)	~2 μs	Pattern matching

Recommendation:

Default: Use Cosine (good balance)
Speed: Use Dot Product if vectors pre-normalized
Accuracy: Use Pearson for correlation-based matching

// Fastest (if embeddings are normalized)
options.WithSimilarityComparator(similarity.DotProductSimilarity)

// Most accurate (general case)
options.WithSimilarityComparator(similarity.CosineSimilarity)

// Pattern matching
options.WithSimilarityComparator(similarity.PearsonCorrelationSimilarity)

2. Optimize Threshold Selection

Threshold impacts both performance and accuracy:

// High threshold = fewer comparisons (early exit possible)
match, _ := cache.Lookup(ctx, query, 0.95)  // Very strict

// Medium threshold = balanced
match, _ := cache.Lookup(ctx, query, 0.85)  // Recommended

// Low threshold = more matches (slower)
match, _ := cache.Lookup(ctx, query, 0.70)  // Permissive

Best Practice:

Start with 0.85 threshold
Measure false positive/negative rates
Adjust based on use case:
- High precision needed? Increase to 0.9+
- High recall needed? Decrease to 0.7-0.8

3. Use Batch Operations

Instead of:

// Slow: N API calls, N cache operations
for _, item := range items {
    cache.Set(ctx, item.Key, item.Text, item.Value)
}

Do this:

// Fast: Single batch operation
batchItems := make([]semanticcache.BatchItem[string, Data], len(items))
for i, item := range items {
    batchItems[i] = semanticcache.BatchItem[string, Data]{
        Key: item.Key,
        InputText: item.Text,
        Value: item.Value,
    }
}
cache.SetBatch(ctx, batchItems)

Performance Gain:

In-memory: ~2-3x faster (less overhead)
Redis: ~10x faster (pipelining)
OpenAI API: Could batch embeddings (future optimization)

4. Leverage Async Operations

For Independent Operations:

// Sequential (slow): 3× latency
cache.Set(ctx, "key1", "text1", "value1")
cache.Set(ctx, "key2", "text2", "value2")
cache.Set(ctx, "key3", "text3", "value3")

// Concurrent (fast): max(latencies)
errCh1 := cache.SetAsync(ctx, "key1", "text1", "value1")
errCh2 := cache.SetAsync(ctx, "key2", "text2", "value2")
errCh3 := cache.SetAsync(ctx, "key3", "text3", "value3")

// Wait for all
for _, ch := range []<-chan error{errCh1, errCh2, errCh3} {
    if err := <-ch; err != nil {
        log.Printf("Error: %v", err)
    }
}

When to Use Async:

✅ Multiple independent operations
✅ Don't need immediate result
✅ Can parallelize work
❌ Operations depend on each other
❌ Need synchronous flow

5. Optimize Embedding API Calls

OpenAI API Limits:

Rate limits: Tier-based (500-10,000 RPM)
Token limits: 8,191 tokens per request
Cost: Per token charged

Strategies:

a) Cache Query Embeddings:

type CachedEmbeddingProvider struct {
    provider types.EmbeddingProvider
    cache    map[string][]float32  // Query → embedding
    mu       sync.RWMutex
}

func (p *CachedEmbeddingProvider) EmbedText(text string) ([]float32, error) {
    p.mu.RLock()
    if emb, ok := p.cache[text]; ok {
        p.mu.RUnlock()
        return emb, nil
    }
    p.mu.RUnlock()

    emb, err := p.provider.EmbedText(text)
    if err != nil {
        return nil, err
    }

    p.mu.Lock()
    p.cache[text] = emb
    p.mu.Unlock()

    return emb, nil
}

b) Batch Embeddings (Future):

// TODO: Library could detect batch operations and batch API calls
embeddings, err := provider.EmbedTexts([]string{"text1", "text2", ...})

c) Use Smaller Models:

// Faster, cheaper, slightly less accurate
options.WithOpenAIProvider("key", "text-embedding-3-small")  // 1536 dims

// vs. larger model
options.WithOpenAIProvider("key", "text-embedding-3-large")  // 3072 dims

6. Limit Cache Size

Problem: Large caches slow down semantic search (O(n))

Solution: Set appropriate capacity

// Bad: Unlimited growth
cache := New(options.WithLRUBackend(1000000))  // 1M entries = slow search

// Good: Bounded by use case
cache := New(options.WithLRUBackend(10000))  // 10k entries = fast search

Rule of Thumb:

< 1,000 entries: Any backend works well
1,000 - 10,000: In-memory backends acceptable
10,000 - 100,000: Consider optimization or Redis
100,000: Need vector indexing (future feature)

7. Use Context Timeouts

Prevent Hanging Operations:

// Bad: No timeout
ctx := context.Background()
match, err := cache.Lookup(ctx, query, 0.8)  // Could hang forever

// Good: With timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

match, err := cache.Lookup(ctx, query, 0.8)
if err != nil {
    if ctx.Err() == context.DeadlineExceeded {
        log.Println("Lookup timed out")
    }
}

Recommended Timeouts:

Set: 10s (includes embedding API call)
Get: 1s (cache lookup only)
Lookup: 30s (depends on cache size)
TopMatches: 60s (compute-heavy)

Async vs Sync Operations

When to Use Sync

Use synchronous operations when:

Simple request-response flow:

func handleRequest(w http.ResponseWriter, r *http.Request) {
    value, found, err := cache.Get(r.Context(), key)
    if found {
        json.NewEncoder(w).Encode(value)
    }
}

Operation must complete before proceeding:

err := cache.Set(ctx, key, text, value)
if err != nil {
    return err  // Can't continue
}
// Use value immediately
processValue(value)

Sequential dependencies:

// Must happen in order
cache.Set(ctx, "user", userQuery, userData)
match, _ := cache.Lookup(ctx, "similar query", 0.8)

When to Use Async

Use asynchronous operations when:

Fire-and-forget:

func handleUpdate(update Update) {
    // Don't block on cache update
    errCh := cache.SetAsync(ctx, update.Key, update.Text, update.Value)

    // Do other work
    processUpdate(update)

    // Check result later (optional)
    if err := <-errCh; err != nil {
        log.Printf("Cache update failed: %v", err)
    }
}

Parallel operations:

// Search multiple queries concurrently
results := make([]Match, len(queries))
channels := make([]<-chan LookupResult, len(queries))

for i, query := range queries {
    channels[i] = cache.LookupAsync(ctx, query, 0.8)
}

for i, ch := range channels {
    result := <-ch
    if result.Match != nil {
        results[i] = *result.Match
    }
}

Non-blocking background tasks:

func warmCache(items []Item) {
    for _, item := range items {
        // Non-blocking warmup
        cache.SetAsync(context.Background(), item.Key, item.Text, item.Value)
    }
    // Continue immediately
}

Performance Comparison

Scenario: Set 100 items

Approach	Latency	Throughput	Complexity
Sync Sequential	100 × T	1/T	Simple
Sync Batch	1 × T_batch	Higher	Simple
Async Concurrent	max(T)	100/T	Moderate
Async Batch	1 × T_batch	Highest	Simple

Where T = time per operation

Recommendation:

Best: SetBatchAsync (combines batching + async)
Good: SetBatch (batching alone)
Okay: Individual SetAsync (parallel)
Slow: Individual Set (sequential)

Best Practices

1. Resource Management

Always close cache when done:

func main() {
    cache, err := semanticcache.New[string, string](...)
    if err != nil {
        log.Fatal(err)
    }
    defer cache.Close()  // ← Important!

    // Use cache...
}

Why: Closes embedding provider, releases backend resources (connections, goroutines)

2. Error Handling

Check all errors:

// Bad
cache.Set(ctx, key, text, value)  // Ignoring error

// Good
if err := cache.Set(ctx, key, text, value); err != nil {
    log.Printf("Failed to cache: %v", err)
    // Handle appropriately (retry, fallback, etc.)
}

Semantic search errors:

match, err := cache.Lookup(ctx, query, 0.8)
if err != nil {
    // Embedding or search error
    return err
}
if match == nil {
    // No match found (not an error)
    log.Println("No similar content found")
}

3. Context Usage

Pass request context through:

func handleAPI(w http.ResponseWriter, r *http.Request) {
    // Use request context (includes cancellation)
    match, err := cache.Lookup(r.Context(), query, 0.8)
    if err != nil {
        if errors.Is(err, context.Canceled) {
            // Client disconnected
            return
        }
        http.Error(w, err.Error(), 500)
        return
    }
    // ...
}

4. Key Design

Use structured keys:

// Bad: Hard to manage
cache.Set(ctx, "user123query", text, value)

// Good: Structured
type CacheKey struct {
    UserID string
    QueryID string
}

func (k CacheKey) String() string {
    return fmt.Sprintf("%s:%s", k.UserID, k.QueryID)
}

cache.Set(ctx, CacheKey{"user123", "query456"}, text, value)

Key namespacing:

// Prevent key collisions across use cases
cache.Set(ctx, "faq:"+questionID, question, answer)
cache.Set(ctx, "docs:"+docID, docText, docContent)

5. Monitoring

Track cache performance:

type CacheMetrics struct {
    Hits   int64
    Misses int64
    Errors int64
}

var metrics CacheMetrics

func cacheGet(ctx context.Context, key string) (string, bool) {
    value, found, err := cache.Get(ctx, key)
    if err != nil {
        atomic.AddInt64(&metrics.Errors, 1)
        return "", false
    }
    if found {
        atomic.AddInt64(&metrics.Hits, 1)
    } else {
        atomic.AddInt64(&metrics.Misses, 1)
    }
    return value, found
}

// Expose metrics
func (m *CacheMetrics) HitRatio() float64 {
    total := m.Hits + m.Misses
    if total == 0 {
        return 0
    }
    return float64(m.Hits) / float64(total)
}

Common Pitfalls

1. Ignoring Zero-Value Keys

Problem:

var key string  // Zero value: ""
cache.Set(ctx, key, text, value)  // Error: key cannot be zero value

Solution:

if key == "" {
    return errors.New("invalid key")
}
cache.Set(ctx, key, text, value)

2. Not Checking for Nil Matches

Problem:

match, _ := cache.Lookup(ctx, query, 0.8)
fmt.Println(match.Value)  // Panic if match is nil!

Solution:

match, err := cache.Lookup(ctx, query, 0.8)
if err != nil {
    return err
}
if match != nil {
    fmt.Println(match.Value)
} else {
    fmt.Println("No match found")
}

3. Goroutine Leaks with Async

Problem:

for i := 0; i < 1000000; i++ {
    cache.SetAsync(ctx, key, text, value)  // Channel never read!
}
// Goroutines accumulate

Solution:

for i := 0; i < 1000; i++ {
    errCh := cache.SetAsync(ctx, key, text, value)
    // Must read from channel
    if err := <-errCh; err != nil {
        log.Printf("Error: %v", err)
    }
}

Or use sync version:

for i := 0; i < 1000000; i++ {
    cache.Set(ctx, key, text, value)  // No goroutines
}

4. Wrong Similarity Threshold

Problem:

// Looking for exact matches but using low threshold
match, _ := cache.Lookup(ctx, "password reset", 0.5)  // Too permissive

Solution:

// Adjust threshold to use case
match, _ := cache.Lookup(ctx, "password reset", 0.9)  // Strict

Guidelines:

Exact match needed: 0.95+
Close match: 0.85-0.95
Similar topic: 0.75-0.85
Related content: 0.65-0.75

5. Large Batch Operations

Problem:

// Memory spike: All items in memory
items := make([]BatchItem, 1000000)  // 1M items!
cache.SetBatch(ctx, items)

Solution:

// Chunk into smaller batches
const batchSize = 1000
for i := 0; i < len(items); i += batchSize {
    end := i + batchSize
    if end > len(items) {
        end = len(items)
    }
    cache.SetBatch(ctx, items[i:end])
}

Benchmarking

Running Benchmarks

Create benchmark file:

// cache_bench_test.go
package semanticcache_test

import (
    "context"
    "testing"
    "github.com/botirk38/semanticcache"
    "github.com/botirk38/semanticcache/options"
)

func BenchmarkSet(b *testing.B) {
    cache, _ := semanticcache.New[string, string](
        options.WithLRUBackend(10000),
        options.WithCustomProvider(&mockProvider{}),
    )
    defer cache.Close()

    ctx := context.Background()
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        cache.Set(ctx, "key", "text", "value")
    }
}

func BenchmarkLookup(b *testing.B) {
    cache, _ := setupCache(1000)  // 1k entries
    ctx := context.Background()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        cache.Lookup(ctx, "query", 0.8)
    }
}

Run benchmarks:

# All benchmarks
go test -bench=. -benchmem

# Specific benchmark
go test -bench=BenchmarkLookup -benchtime=10s

# With CPU profiling
go test -bench=. -cpuprofile=cpu.prof
go tool pprof cpu.prof

Interpreting Results

Example output:

BenchmarkSet-8              50000    25000 ns/op    1024 B/op    15 allocs/op
BenchmarkGet-8            1000000     1200 ns/op      64 B/op     2 allocs/op
BenchmarkLookup-8            1000  1200000 ns/op    4096 B/op   100 allocs/op

Columns:

50000: Number of iterations
25000 ns/op: 25 μs per operation
1024 B/op: Bytes allocated per op
15 allocs/op: Allocations per op

Red Flags:

Lookup time grows linearly with cache size (expected, but note threshold)
High allocation rate (potential GC pressure)
Operations slower than network latency (for Redis)

Performance Goals

Target Performance (Approximate):

Operation	In-Memory	Redis	Notes
Set	< 50 μs	< 5 ms	Includes embedding
Get	< 1 μs	< 2 ms	Cache hit
Lookup (1k entries)	< 1 ms	< 10 ms	Linear scan
Lookup (10k entries)	< 10 ms	< 50 ms	Needs indexing
TopMatches (1k, n=10)	< 2 ms	< 15 ms	Sort overhead

Note: Actual performance depends on:

Hardware (CPU, network)
Embedding dimensions
Similarity function
Value sizes

FilesExpand file tree

performance.md

Latest commit

History

performance.md

File metadata and controls

Performance Guide & Best Practices

Table of Contents

Performance Characteristics

Operation Complexity

In-Memory Backends (LRU, LFU, FIFO)

Redis Backend

Memory Usage

Backend Selection

When to Use LRU (Least Recently Used)

When to Use LFU (Least Frequently Used)

When to Use FIFO (First In, First Out)

When to Use Redis

Optimization Strategies

1. Choose Right Similarity Function

2. Optimize Threshold Selection

3. Use Batch Operations

4. Leverage Async Operations

5. Optimize Embedding API Calls

6. Limit Cache Size

7. Use Context Timeouts

Async vs Sync Operations

When to Use Sync

When to Use Async

Performance Comparison

Best Practices

1. Resource Management

2. Error Handling

3. Context Usage

4. Key Design

5. Monitoring

Common Pitfalls

1. Ignoring Zero-Value Keys

2. Not Checking for Nil Matches

3. Goroutine Leaks with Async

4. Wrong Similarity Threshold

5. Large Batch Operations

Benchmarking

Running Benchmarks

Interpreting Results

Performance Goals