Skip to content

Conversation

destrex271
Copy link

Added the function provided in the original issue as it is within the class as a starting point. Compiles hash_keyword with crc32 acceleration if running on a mac otherwise falls back to fnv1a implementation.

In Draft Phase right now - Partial implementation only.

@destrex271
Copy link
Author

@chiradip just to confirm, for the keywork lookup optimizations, I plan on writing a new version of the following function:

[[nodiscard]] inline Keyword find_keyword(std::string_view text) noexcept {
which will be compiled if we are on arm64.
Does this seem right?

@chiradip
Copy link
Collaborator

Sounds like you are on right path, make sure you don't override autogenerated header but override in .cpp or propose change to the script that generates header from ebnf grammar.

@chiradip
Copy link
Collaborator

How does it compile and run? Did you run the test suite?

@chiradip chiradip marked this pull request as ready for review September 27, 2025 19:48
@chiradip chiradip marked this pull request as draft September 27, 2025 19:49
@destrex271
Copy link
Author

destrex271 commented Oct 3, 2025

How does it compile and run? Did you run the test suite?

Hi Apologies for the late reply. I am currently testing it out

@destrex271
Copy link
Author

@chiradip , I have compiled the setup locally on a macos system and the test suite also runs properly.

Currently bench marking the performance according to the performance guide

@destrex271
Copy link
Author

Although I am a little confused as to where can I find the benchmark binary mentioned in the Performance Guidelines. Can you please help me with that?

@chiradip
Copy link
Collaborator

chiradip commented Oct 5, 2025

It is in tutorial.md and not included in the tests directory yet. Here is the code, feel free to add this code to the code repository under test/s. Also enhance as you wish incrementally. Thanks a lot for the contribution.

Performance Comparison

void benchmark_simd_vs_scalar() {
    std::string sql = load_large_sql_file();
    
    // SIMD tokenization
    auto simd_start = now();
    db25::SimdTokenizer simd_tokenizer(...);
    auto simd_tokens = simd_tokenizer.tokenize();
    auto simd_time = now() - simd_start;
    
    // Scalar tokenization (hypothetical)
    auto scalar_start = now();
    auto scalar_tokens = scalar_tokenize(sql);
    auto scalar_time = now() - scalar_start;
    
    std::cout << "SIMD speedup: " << (scalar_time / simd_time) << "x\n";
    // Typical output: "SIMD speedup: 4.5x"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Hardware CRC32 Acceleration for Keyword Matching

2 participants