-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Arrow and Orjson serializers use checksums for integrity checking. This issue tracks alignment with the Rust ByteStorage layer.
Current State (2025-12-11)
✅ Phase 1 Complete: Switched to xxHash3-64 via Python xxhash package
| Serializer | Checksum | Size | Implementation |
|---|---|---|---|
| StandardSerializer | xxHash3-64 | 8 bytes | Rust ByteStorage (FFI) |
| ArrowSerializer | xxHash3-64 | 8 bytes | Python xxhash package |
| OrjsonSerializer | xxHash3-64 | 8 bytes | Python xxhash package |
Files updated:
src/cachekit/serializers/arrow_serializer.pysrc/cachekit/serializers/orjson_serializer.py- Tests:
test_xxhash_integrity.py(14 new tests), updated existing tests
Future Work: FFI Implementation
🔮 Phase 2 (Optional): Use Rust FFI for checksums instead of Python package
Blocked by: cachekit-io/cachekit-core#13 (checksum-only API)
# Current (Python xxhash)
import xxhash
checksum = xxhash.xxh3_64_digest(data)
# Future (Rust FFI) - requires cachekit-core#13
from cachekit._rust_serializer import compute_checksum
checksum = compute_checksum(data)Benefits of FFI approach:
- Single implementation (no Python
xxhashdependency) - Consistent with StandardSerializer path
- Potentially faster for large payloads (avoid Python GIL)
Trade-offs:
- FFI overhead may negate speed gains for small payloads
- More complex build (Rust required)
- Current Python solution works fine
Decision Log
- 2025-12-11: Implemented Phase 1 (Python xxhash). Phase 2 deferred pending cachekit-core#13 and benchmarking to determine if FFI overhead is worth it.
Related
- Upstream: cachekit-core#13 (checksum-only API in Rust)
- Context: xxHash3 migration in ByteStorage (2025-12-05)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request