Skip to content

refactor: custom lexer #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 6, 2025
Merged

refactor: custom lexer #437

merged 17 commits into from
Jul 6, 2025

Conversation

psteinroe
Copy link
Collaborator

@psteinroe psteinroe commented Jul 1, 2025

  • adds a new tokenizer crate that turns a string into simple tokens
  • adds a new lexer + lexer_codegen that uses the tokeniser to lex into a new SyntaxKind enum

the new implementation is

  • much more performant (no extra string allocations, no call to C library)
  • works with broken strings (!!!!)
  • custom-made to our use-case (eg the LineEnding variant comes with a count)

in a follow-up, we will be able to:

  • parse custom parameters that popular tools use
  • pre-process to remove unsupported stuff
  • parse non-sql content (e.g. commands) via a simple custom parser

todos:

  • use new lexer in splitter
  • make sure we support all the different parameter formats popular tools use -> will do it in a follow-up
  • tests

@psteinroe psteinroe changed the title refactor: parser refactor: lexer Jul 1, 2025
@psteinroe psteinroe requested a review from juleswritescode July 4, 2025 16:00
@psteinroe psteinroe marked this pull request as ready for review July 4, 2025 16:00
@psteinroe psteinroe changed the title refactor: lexer refactor: custom lexer Jul 4, 2025
@psteinroe psteinroe merged commit 21b05d2 into main Jul 6, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants