-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Feature Request: Hardware CRC32 Acceleration for Keyword Matching
Project: DB25-sql-tokenizer2
Date: 2025-08-30
Summary
Add hardware CRC32 acceleration for keyword matching to improve tokenization performance by 5-10x on keyword detection.
Background
During parser optimization, we discovered that ARM Macs (M1/M2/M3) have hardware CRC32 instructions that can significantly accelerate string hashing and keyword matching. Testing shows:
- ARM CRC32 instructions are available via
__crc32b/h/w/dintrinsics - 3-10x speedup for string hashing operations
- O(1) keyword lookups with hash tables vs O(n) string comparisons
Current State
The tokenizer currently uses:
- SIMD for parallel character processing (excellent)
- String comparisons for keyword matching (can be improved)
- Linear search through keyword list (O(n) complexity)
Proposed Enhancement
1. Hardware CRC32 Hash Function
#ifdef __ARM_FEATURE_CRC32
#include <arm_acle.h>
uint32_t hash_keyword(const char* str, size_t len) {
uint32_t hash = 0;
// Process 8 bytes at a time
while (len >= 8) {
uint64_t chunk;
memcpy(&chunk, str, 8);
hash = __crc32d(hash, chunk);
str += 8;
len -= 8;
}
// Handle remaining bytes
while (len > 0) {
hash = __crc32b(hash, *str++);
len--;
}
return hash;
}
#endif2. Hash Table for O(1) Keyword Lookup
Replace linear keyword search with hash table:
- Pre-compute CRC32 hashes for all keywords
- Use power-of-2 sized hash table (256 or 512 entries)
- Linear probing for collision resolution
- Case-insensitive matching during lookup
3. Integration Points
- Add to
SimdTokenizer::tokenize()keyword detection phase - Maintain compatibility with existing keyword enum
- Fall back to current implementation on non-ARM platforms
Performance Impact
Measured Results (Apple M1/M2)
- Short strings (5-10 chars): 5-10x faster
- Medium strings (15-25 chars): 3-5x faster
- Keyword matching: 10x faster than string comparisons
- Hash combining: 2x faster than software
Expected Benefits
- Reduce keyword detection from ~15% to ~2% of tokenization time
- Overall tokenization speedup: 10-15%
- Better cache utilization (hash table fits in L1)
Implementation Complexity
- Low complexity: ~200 lines of code
- Low risk: Isolated change with fallback
- Platform-specific: Only affects ARM64 builds
Testing Requirements
- Verify all keywords correctly identified
- Benchmark before/after performance
- Test case-insensitive matching
- Validate on both ARM and x86 platforms
Compatibility
- Fully backward compatible
- No API changes
- No behavior changes
- Graceful fallback on platforms without CRC32
Priority
Medium-High - Significant performance gain for minimal effort
References
- ARM CRC32 intrinsics: https://developer.arm.com/documentation/
- Parser performance analysis showing keyword matching bottleneck
- Test results showing 3-10x speedup with hardware CRC32
Code Example
// Before (current implementation)
for (const auto& keyword : keywords) {
if (strcasecmp(token, keyword.text) == 0) {
return keyword.id;
}
}
// After (with CRC32)
uint32_t hash = hash_keyword(token, len);
size_t bucket = hash & (TABLE_SIZE - 1);
while (table[bucket]) {
if (table[bucket]->hash == hash &&
strcasecmp(token, table[bucket]->text) == 0) {
return table[bucket]->id;
}
bucket = (bucket + 1) & (TABLE_SIZE - 1);
}Additional Notes
- The parser already demonstrates successful integration with the tokenizer
- This optimization would benefit all consumers of the tokenizer
- Consider adding similar optimization for operator detection
- Could extend to identifier interning for symbol tables
Filed by: DB25-parser2 team
Contact: See parser repository for integration examples
Metadata
Metadata
Assignees
Labels
No labels