Skip to content

Commit 27d3968

Browse files
committed
feat: add FileBackend for filesystem-based caching
Implements a pure Python filesystem cache backend with: Core Features: - BaseBackend protocol compliance (get/set/delete/exists/health_check) - Thread-safe operations via RLock + fcntl.flock/msvcrt.locking - Atomic writes via write-then-rename pattern - LRU eviction at 90% capacity to 70% target - TTL support with expiration enforcement on read - 14-byte versioned header format for corruption detection Security Hardening: - blake2b key hashing prevents path traversal - O_NOFOLLOW on all file opens prevents symlink attacks - lstat() in cleanup/eviction prevents symlink manipulation - ELOOP handling for symlink detection at runtime - Default permissions 0o600/0o700 with security warnings - TTL bounds validation (0 to 10 years max) - Entry count limits prevent disk exhaustion - TOCTOU elimination via "ask forgiveness" pattern Configuration (via CACHEKIT_FILE_* env vars): - cache_dir, max_size_mb, max_value_mb - max_entry_count, lock_timeout_seconds - permissions, dir_permissions Test Coverage: - 89 unit tests (91% coverage) - 7 integration tests (concurrency, atomicity, eviction) - 5 performance benchmarks - 4 critical path tests - Security-specific tests for symlink attacks, TOCTOU, TTL overflow
1 parent 68da8db commit 27d3968

File tree

11 files changed

+4412
-3
lines changed

11 files changed

+4412
-3
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ def test_cached_function():
168168
- Circuit breaker with graceful degradation
169169
- Connection pooling with thread affinity (+28% throughput)
170170
- Distributed locking prevents cache stampedes
171-
- Pluggable backend abstraction (Redis, HTTP, DynamoDB, custom)
171+
- Pluggable backend abstraction (Redis, File, HTTP, DynamoDB, custom)
172172

173173
> [!NOTE]
174174
> All reliability features are **enabled by default** with `@cache.production`. Use `@cache.minimal` to disable them for maximum throughput.

docs/guides/backend-guide.md

Lines changed: 120 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,114 @@ def cached_function():
106106
- Connection pooling built-in
107107
- Supports large values (up to Redis limits)
108108

109+
### FileBackend
110+
111+
Store cache on the local filesystem with automatic LRU eviction:
112+
113+
```python
114+
from cachekit.backends.file import FileBackend
115+
from cachekit.backends.file.config import FileBackendConfig
116+
from cachekit import cache
117+
118+
# Use default configuration
119+
config = FileBackendConfig()
120+
backend = FileBackend(config)
121+
122+
@cache(backend=backend)
123+
def cached_function():
124+
return expensive_computation()
125+
```
126+
127+
**Configuration via environment variables**:
128+
129+
```bash
130+
# Directory for cache files
131+
export CACHEKIT_FILE_CACHE_DIR="/var/cache/myapp"
132+
133+
# Size limits
134+
export CACHEKIT_FILE_MAX_SIZE_MB=1024 # Default: 1024 MB
135+
export CACHEKIT_FILE_MAX_VALUE_MB=100 # Default: 100 MB (max single value)
136+
export CACHEKIT_FILE_MAX_ENTRY_COUNT=10000 # Default: 10,000 entries
137+
138+
# Lock configuration
139+
export CACHEKIT_FILE_LOCK_TIMEOUT_SECONDS=5.0 # Default: 5.0 seconds
140+
141+
# File permissions (octal, owner-only by default for security)
142+
export CACHEKIT_FILE_PERMISSIONS=0o600 # Default: 0o600 (owner read/write)
143+
export CACHEKIT_FILE_DIR_PERMISSIONS=0o700 # Default: 0o700 (owner rwx)
144+
```
145+
146+
**Configuration via Python**:
147+
148+
```python
149+
from pathlib import Path
150+
from cachekit.backends.file import FileBackend
151+
from cachekit.backends.file.config import FileBackendConfig
152+
153+
# Custom configuration
154+
config = FileBackendConfig(
155+
cache_dir=Path("/var/cache/myapp"),
156+
max_size_mb=2048,
157+
max_value_mb=200,
158+
max_entry_count=50000,
159+
lock_timeout_seconds=10.0,
160+
permissions=0o600,
161+
dir_permissions=0o700,
162+
)
163+
164+
backend = FileBackend(config)
165+
```
166+
167+
**When to use**:
168+
- Single-process applications (scripts, CLI tools, development)
169+
- Local development and testing
170+
- Systems where Redis is unavailable
171+
- Low-traffic applications with modest cache sizes
172+
- Temporary caching needs
173+
174+
**When NOT to use**:
175+
- Multi-process web servers (gunicorn, uWSGI) - use Redis instead
176+
- Distributed systems - use Redis or HTTP backend
177+
- High-concurrency scenarios - file locking overhead becomes limiting
178+
- Applications requiring sub-1ms latency - use L1-only cache
179+
180+
**Characteristics**:
181+
- Latency: p50: 100-500μs, p99: 1-5ms
182+
- Throughput: 1000+ operations/second (single-threaded)
183+
- LRU eviction: Triggered at 90%, evicts to 70% capacity
184+
- TTL support: Yes (automatic expiration checking)
185+
- Cross-process: No (single-process only)
186+
- Platform support: Full on Linux/macOS, limited on Windows (no O_NOFOLLOW)
187+
188+
**Limitations and Security Notes**:
189+
190+
1. **Single-process only**: FileBackend uses file locking that doesn't prevent concurrent access from multiple processes. Do NOT use with multi-process WSGI servers.
191+
192+
2. **File permissions**: Default permissions (0o600) restrict access to cache files to the owning user. Changing these permissions is a security risk and generates a warning.
193+
194+
3. **Platform differences**: Windows does not support the O_NOFOLLOW flag used to prevent symlink attacks. FileBackend still works but has slightly reduced symlink protection on Windows.
195+
196+
4. **Wall-clock TTL**: Expiration times rely on system time. Changes to system time (NTP, manual adjustments) may affect TTL accuracy.
197+
198+
5. **Disk space**: FileBackend will evict least-recently-used entries when reaching 90% capacity. Ensure sufficient disk space beyond max_size_mb for temporary writes.
199+
200+
**Performance characteristics**:
201+
202+
```
203+
Sequential operations (single-threaded):
204+
- Write (set): p50: 120μs, p99: 800μs
205+
- Read (get): p50: 90μs, p99: 600μs
206+
- Delete: p50: 70μs, p99: 400μs
207+
208+
Concurrent operations (10 threads):
209+
- Throughput: ~887 ops/sec
210+
- Latency p99: ~30μs per operation
211+
212+
Large values (1MB):
213+
- Write p99: ~15μs per operation
214+
- Read p99: ~13μs per operation
215+
```
216+
109217
### HTTPBackend
110218

111219
Store cache in HTTP API endpoints:
@@ -338,18 +446,27 @@ REDIS_URL=redis://localhost:6379/0
338446
| Backend | Latency | Use Case | Notes |
339447
|---------|---------|----------|-------|
340448
| **L1 (In-Memory)** | ~50ns | Repeated calls in same process | Process-local only |
449+
| **File** | 100μs-5ms | Single-process local caching | Development, scripts, CLI tools |
341450
| **Redis** | 1-7ms | Shared cache across pods | Production default |
342451
| **HTTP API** | 10-100ms | Cloud services, multi-region | Network dependent |
343452
| **DynamoDB** | 100-500ms | Serverless, low-traffic | High availability |
344453
| **Memcached** | 1-5ms | Alternative to Redis | No persistence |
345454

346455
### When to Use Each Backend
347456

457+
**Use FileBackend when**:
458+
- You're building single-process applications (scripts, CLI tools)
459+
- You're in development and don't have Redis available
460+
- You need local caching without network overhead
461+
- You have modest cache sizes (< 10GB)
462+
- Your application runs on a single machine
463+
348464
**Use RedisBackend when**:
349-
- You need sub-10ms latency
465+
- You need sub-10ms latency with shared cache
350466
- Cache is shared across multiple processes
351467
- You need persistence options
352468
- You're building a typical web application
469+
- You require multi-process or distributed caching
353470

354471
**Use HTTPBackend when**:
355472
- You're using a cloud cache service
@@ -364,9 +481,10 @@ REDIS_URL=redis://localhost:6379/0
364481
- You need automatic TTL management
365482

366483
**Use L1-only when**:
367-
- You're in development
484+
- You're in development with single-process code
368485
- You have a single-process application
369486
- You don't need cross-process cache sharing
487+
- You need the lowest possible latency (nanoseconds)
370488

371489
### Testing Your Backend
372490

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""File-based backend for local disk caching.
2+
3+
This module provides a production-ready filesystem-based cache backend with:
4+
- Thread-safe operations using reentrant locks and file-level locking
5+
- Atomic writes via write-then-rename pattern
6+
- LRU eviction based on disk usage thresholds
7+
- TTL-based expiration with secure header format
8+
- Security features (O_NOFOLLOW, symlink prevention)
9+
10+
Public API:
11+
- FileBackend: Main backend implementation
12+
- FileBackendConfig: Configuration class
13+
14+
Example:
15+
>>> from cachekit.backends.file import FileBackend, FileBackendConfig
16+
>>> config = FileBackendConfig(cache_dir="/tmp/cachekit")
17+
>>> backend = FileBackend(config)
18+
>>> backend.set("key", b"value", ttl=60)
19+
>>> data = backend.get("key")
20+
"""
21+
22+
from __future__ import annotations
23+
24+
from cachekit.backends.file.backend import FileBackend
25+
from cachekit.backends.file.config import FileBackendConfig
26+
27+
__all__ = [
28+
"FileBackend",
29+
"FileBackendConfig",
30+
]

0 commit comments

Comments
 (0)