-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
ByteStorage currently couples LZ4 compression with xxHash3-64 integrity checking. All store()/retrieve() operations require both features enabled together. This prevents using the Rust xxHash3 implementation for integrity-only use cases.
Current State
// StorageEnvelope::new() requires ALL features
#[cfg(all(feature = "compression", feature = "checksum"))]
pub fn new(data: Vec<u8>, format: String) -> Result<Self, ByteStorageError>
// No checksum-only path existsThe Python Arrow/Orjson serializers bypass ByteStorage entirely and use their own Blake3 checksums because:
- LZ4 compression is ineffective on Arrow IPC (columnar) and JSON (already compact)
- No way to get just the xxHash3 checksum without compression overhead
Proposed Change
Add a checksum-only API that provides xxHash3-64 integrity without compression:
// Option A: New feature-gated methods
#[cfg(feature = "checksum")]
impl ByteStorage {
pub fn checksum(&self, data: &[u8]) -> [u8; 8];
pub fn verify_checksum(&self, data: &[u8], expected: &[u8; 8]) -> bool;
}
// Option B: Separate IntegrityChecker struct
pub struct IntegrityChecker;
impl IntegrityChecker {
pub fn compute(data: &[u8]) -> [u8; 8];
pub fn verify(data: &[u8], expected: &[u8; 8]) -> bool;
}Benefits
- Consistency: All serializers use same xxHash3-64 algorithm via Rust FFI
- Performance: Arrow/Orjson get 19x faster checksums (36 GB/s vs Blake3's 2 GB/s)
- Space: 8-byte checksums vs 32-byte Blake3 (24 bytes saved per item)
- No wasted cycles: Skip LZ4 where compression is ineffective
Current Workaround
Use xxhash Python package directly in Arrow/Orjson serializers (Option B from discussion). This provides algorithm consistency without Rust changes, but adds a Python dependency.
Context
- Related discussion: xxHash3 migration in ByteStorage (2025-12-05)
- Affected files:
arrow_serializer.py,orjson_serializer.pycurrently use Blake3 - Architecture doc:
strategy/saas-protocol-v1.0.md
Acceptance Criteria
- Checksum-only API available without enabling compression feature
- PyO3 bindings expose checksum functions
- Documentation updated
- Benchmark comparing Python xxhash vs Rust FFI overhead
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request