Feature Request: Hardware CRC32 Acceleration for Keyword Matching

# Feature Request: Hardware CRC32 Acceleration for Keyword Matching

## Project: DB25-sql-tokenizer2

## Date: 2025-08-30

## Summary
Add hardware CRC32 acceleration for keyword matching to improve tokenization performance by 5-10x on keyword detection.

## Background
During parser optimization, we discovered that ARM Macs (M1/M2/M3) have hardware CRC32 instructions that can significantly accelerate string hashing and keyword matching. Testing shows:
- ARM CRC32 instructions are available via `__crc32b/h/w/d` intrinsics
- 3-10x speedup for string hashing operations
- O(1) keyword lookups with hash tables vs O(n) string comparisons

## Current State
The tokenizer currently uses:
- SIMD for parallel character processing (excellent)
- String comparisons for keyword matching (can be improved)
- Linear search through keyword list (O(n) complexity)

## Proposed Enhancement

### 1. Hardware CRC32 Hash Function
```cpp
#ifdef __ARM_FEATURE_CRC32
#include <arm_acle.h>

uint32_t hash_keyword(const char* str, size_t len) {
    uint32_t hash = 0;
    
    // Process 8 bytes at a time
    while (len >= 8) {
        uint64_t chunk;
        memcpy(&chunk, str, 8);
        hash = __crc32d(hash, chunk);
        str += 8;
        len -= 8;
    }
    
    // Handle remaining bytes
    while (len > 0) {
        hash = __crc32b(hash, *str++);
        len--;
    }
    
    return hash;
}
#endif
```

### 2. Hash Table for O(1) Keyword Lookup
Replace linear keyword search with hash table:
- Pre-compute CRC32 hashes for all keywords
- Use power-of-2 sized hash table (256 or 512 entries)
- Linear probing for collision resolution
- Case-insensitive matching during lookup

### 3. Integration Points
- Add to `SimdTokenizer::tokenize()` keyword detection phase
- Maintain compatibility with existing keyword enum
- Fall back to current implementation on non-ARM platforms

## Performance Impact

### Measured Results (Apple M1/M2)
- **Short strings (5-10 chars)**: 5-10x faster
- **Medium strings (15-25 chars)**: 3-5x faster  
- **Keyword matching**: 10x faster than string comparisons
- **Hash combining**: 2x faster than software

### Expected Benefits
- Reduce keyword detection from ~15% to ~2% of tokenization time
- Overall tokenization speedup: 10-15%
- Better cache utilization (hash table fits in L1)

## Implementation Complexity
- **Low complexity**: ~200 lines of code
- **Low risk**: Isolated change with fallback
- **Platform-specific**: Only affects ARM64 builds

## Testing Requirements
1. Verify all keywords correctly identified
2. Benchmark before/after performance
3. Test case-insensitive matching
4. Validate on both ARM and x86 platforms

## Compatibility
- Fully backward compatible
- No API changes
- No behavior changes
- Graceful fallback on platforms without CRC32

## Priority
**Medium-High** - Significant performance gain for minimal effort

## References
- ARM CRC32 intrinsics: https://developer.arm.com/documentation/
- Parser performance analysis showing keyword matching bottleneck
- Test results showing 3-10x speedup with hardware CRC32

## Code Example
```cpp
// Before (current implementation)
for (const auto& keyword : keywords) {
    if (strcasecmp(token, keyword.text) == 0) {
        return keyword.id;
    }
}

// After (with CRC32)
uint32_t hash = hash_keyword(token, len);
size_t bucket = hash & (TABLE_SIZE - 1);
while (table[bucket]) {
    if (table[bucket]->hash == hash && 
        strcasecmp(token, table[bucket]->text) == 0) {
        return table[bucket]->id;
    }
    bucket = (bucket + 1) & (TABLE_SIZE - 1);
}
```

## Additional Notes
- The parser already demonstrates successful integration with the tokenizer
- This optimization would benefit all consumers of the tokenizer
- Consider adding similar optimization for operator detection
- Could extend to identifier interning for symbol tables

---

*Filed by: DB25-parser2 team*
*Contact: See parser repository for integration examples*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Hardware CRC32 Acceleration for Keyword Matching #1

Feature Request: Hardware CRC32 Acceleration for Keyword Matching

Project: DB25-sql-tokenizer2

Date: 2025-08-30

Summary

Background

Current State

Proposed Enhancement

1. Hardware CRC32 Hash Function

2. Hash Table for O(1) Keyword Lookup

3. Integration Points

Performance Impact

Measured Results (Apple M1/M2)

Expected Benefits

Implementation Complexity

Testing Requirements

Compatibility

Priority

References

Code Example

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Hardware CRC32 Acceleration for Keyword Matching #1

Description

Feature Request: Hardware CRC32 Acceleration for Keyword Matching

Project: DB25-sql-tokenizer2

Date: 2025-08-30

Summary

Background

Current State

Proposed Enhancement

1. Hardware CRC32 Hash Function

2. Hash Table for O(1) Keyword Lookup

3. Integration Points

Performance Impact

Measured Results (Apple M1/M2)

Expected Benefits

Implementation Complexity

Testing Requirements

Compatibility

Priority

References

Code Example

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions