Skip to content

Feature Request: Hardware CRC32 Acceleration for Keyword Matching #1

@chiradip

Description

@chiradip

Feature Request: Hardware CRC32 Acceleration for Keyword Matching

Project: DB25-sql-tokenizer2

Date: 2025-08-30

Summary

Add hardware CRC32 acceleration for keyword matching to improve tokenization performance by 5-10x on keyword detection.

Background

During parser optimization, we discovered that ARM Macs (M1/M2/M3) have hardware CRC32 instructions that can significantly accelerate string hashing and keyword matching. Testing shows:

  • ARM CRC32 instructions are available via __crc32b/h/w/d intrinsics
  • 3-10x speedup for string hashing operations
  • O(1) keyword lookups with hash tables vs O(n) string comparisons

Current State

The tokenizer currently uses:

  • SIMD for parallel character processing (excellent)
  • String comparisons for keyword matching (can be improved)
  • Linear search through keyword list (O(n) complexity)

Proposed Enhancement

1. Hardware CRC32 Hash Function

#ifdef __ARM_FEATURE_CRC32
#include <arm_acle.h>

uint32_t hash_keyword(const char* str, size_t len) {
    uint32_t hash = 0;
    
    // Process 8 bytes at a time
    while (len >= 8) {
        uint64_t chunk;
        memcpy(&chunk, str, 8);
        hash = __crc32d(hash, chunk);
        str += 8;
        len -= 8;
    }
    
    // Handle remaining bytes
    while (len > 0) {
        hash = __crc32b(hash, *str++);
        len--;
    }
    
    return hash;
}
#endif

2. Hash Table for O(1) Keyword Lookup

Replace linear keyword search with hash table:

  • Pre-compute CRC32 hashes for all keywords
  • Use power-of-2 sized hash table (256 or 512 entries)
  • Linear probing for collision resolution
  • Case-insensitive matching during lookup

3. Integration Points

  • Add to SimdTokenizer::tokenize() keyword detection phase
  • Maintain compatibility with existing keyword enum
  • Fall back to current implementation on non-ARM platforms

Performance Impact

Measured Results (Apple M1/M2)

  • Short strings (5-10 chars): 5-10x faster
  • Medium strings (15-25 chars): 3-5x faster
  • Keyword matching: 10x faster than string comparisons
  • Hash combining: 2x faster than software

Expected Benefits

  • Reduce keyword detection from ~15% to ~2% of tokenization time
  • Overall tokenization speedup: 10-15%
  • Better cache utilization (hash table fits in L1)

Implementation Complexity

  • Low complexity: ~200 lines of code
  • Low risk: Isolated change with fallback
  • Platform-specific: Only affects ARM64 builds

Testing Requirements

  1. Verify all keywords correctly identified
  2. Benchmark before/after performance
  3. Test case-insensitive matching
  4. Validate on both ARM and x86 platforms

Compatibility

  • Fully backward compatible
  • No API changes
  • No behavior changes
  • Graceful fallback on platforms without CRC32

Priority

Medium-High - Significant performance gain for minimal effort

References

Code Example

// Before (current implementation)
for (const auto& keyword : keywords) {
    if (strcasecmp(token, keyword.text) == 0) {
        return keyword.id;
    }
}

// After (with CRC32)
uint32_t hash = hash_keyword(token, len);
size_t bucket = hash & (TABLE_SIZE - 1);
while (table[bucket]) {
    if (table[bucket]->hash == hash && 
        strcasecmp(token, table[bucket]->text) == 0) {
        return table[bucket]->id;
    }
    bucket = (bucket + 1) & (TABLE_SIZE - 1);
}

Additional Notes

  • The parser already demonstrates successful integration with the tokenizer
  • This optimization would benefit all consumers of the tokenizer
  • Consider adding similar optimization for operator detection
  • Could extend to identifier interning for symbol tables

Filed by: DB25-parser2 team
Contact: See parser repository for integration examples

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions