[NFC] Skip parsing instructions in first parser pass by tlively · Pull Request #8601 · WebAssembly/binaryen

tlively · 2026-04-14T05:22:39Z

The first parser pass is responsible for two things: finding the locations of definitions of top-level module items like globals and functions and finding the locations of implicit function type definitions. It previously accomplished the latter by fully parsing every instruction in each function. But the IR is not constructed in this phase of parsing, so fully parsing every instruction was largely wasted work. Optimize the parser by parsing only the instructions that might have implicit type definitions and otherwise just blindly match parentheses to skip the function body. Combined with #8597, this speeds up parsing by 30-40%.

The lexer previously used its own internal `LexerCtx` abstraction that allowed it to consume the characters that made up a token without changing the lexer state, then update the state at once when committing to consuming the characters. However, manually resetting the lexer to the original position when giving up on parsing a token is simple enough that this abstraction was not holding its weight. Simplify the lexer by removing internal contexts, and move the simplified method bodies to lexer.h. Generally we try to avoid putting lots of code in headers, but in this case making the code available to the inliner, along with removing the extra layer of abstraction, makes the parser about 20% faster.

The first parser pass is responsible for two things: finding the locations of definitions of top-level module items like globals and functions and finding the locations of implicit function type definitions. It previously accomplished the latter by fully parsing every instruction in each function. But the IR is not constructed in this phase of parsing, so fully parsing every instruction was largely wasted work. Optimize the parser by parsing only the instructions that might have implicit type definitions and otherwise just blindly match parentheses to skip the function body. Combined with #8597, this speeds up parsing by 30-40%.

MaxGraey · 2026-04-14T13:38:01Z

src/parser/lexer.h

+  // Consume the next `n` characters.
+  void take(size_t n) { pos += n; }
+  void takeAll() { pos = buffer.size(); }
+


Minor proposal for API:

template<typename F> inline void takeWhile(F&& pred, size_t n = 1) { while (pred()) { pos += n; } } template<typename F> inline void takeUntil(F&& pred, size_t n = 1) { while (!pred()) { pos += n; } }

This may simplify some code like this into

takeWhile(idchar);

I think there are few enough places where this pattern would apply and they are already simple enough that we probably don't need this abstraction right now. But it's a good idea to keep in mind if we parser more repetitive patterns in the future.

tlively added 3 commits April 13, 2026 16:31

fixes

b2528ad

tlively requested a review from a team as a code owner April 14, 2026 05:22

tlively requested review from stevenfontanella and removed request for a team April 14, 2026 05:22

MaxGraey reviewed Apr 14, 2026

View reviewed changes

Base automatically changed from parser-slowdown to main April 14, 2026 15:29

tlively mentioned this pull request Apr 14, 2026

~+700% regression in wasm-opt --nm performance from Emscripten 3.1.38 -> 4.0.19 #8406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NFC] Skip parsing instructions in first parser pass#8601

[NFC] Skip parsing instructions in first parser pass#8601
tlively wants to merge 3 commits intomainfrom
parser-fastscan

tlively commented Apr 14, 2026

Uh oh!

MaxGraey Apr 14, 2026 •

edited

Loading

Uh oh!

tlively Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tlively commented Apr 14, 2026

Uh oh!

MaxGraey Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGraey Apr 14, 2026 •

edited

Loading