Skip to content

Optimize text with escapes parsing#2719

Open
zherczeg wants to merge 1 commit intoWebAssembly:mainfrom
zherczeg:escape_opt
Open

Optimize text with escapes parsing#2719
zherczeg wants to merge 1 commit intoWebAssembly:mainfrom
zherczeg:escape_opt

Conversation

@zherczeg
Copy link
Collaborator

This patch is a small optimization for quoted text parsing, which reduces the number of allocations, and the final string is allocated once (capacity == size, so no bytes wasted at the buffer end).

The final string is allocated once without wasting memory at the end. An internal buffer is allocated at most once, when the inline buffer is too small. The final size of the unescaped buffer is always less or equal than the original size. This observation can be used for reducing memory allocations.

@zherczeg zherczeg force-pushed the escape_opt branch 2 times, most recently from 846d398 to 78027c7 Compare March 15, 2026 11:29
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like some extra code complexity. Are we sure its worth it? Can you quantify the memory savings?

Presumably the savings are just peak memory saving during parsing and the memory usage at the end of parsing will be unchanged?

return Result::Ok;
}

static const size_t kInlineBufferSize = 96;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick this value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. a 100 byte stack is not big, and people prefer shorter names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a comment, even if to mention that its somewhat arbitrary.

The final string is allocated once without wasting memory at the end.
An internal buffer is allocated at most once, when the inline buffer is too small.

The final size of the unescaped buffer is always less or equal than
the original size. This observation can be used for reducing memory allocations.
@zherczeg
Copy link
Collaborator Author

I made a measurement with cpu cycle counters. Parsing time of "[Method] 012345778\61::func" is reduced from 90960 cycles to 77543 cycles, which is 17% improvement (in release mode). The overall runtime is 0.0000379 sec, so this is negligible.

This patch is intended to be a small improvement. Of course it is not that important, I don't mind if it is rejected. The GC patches are more important for me.

@sbc100
Copy link
Member

sbc100 commented Mar 16, 2026

I'm somewhat conflicted because I think we should aim to keep to keep wabt simple where possible. But performance is nice. Perhaps we could leave this open and we can re-consider if folks are noticing slow parse times for text files.

Another measurement I'd be curious about: Does this change have any measurable effect on the time it takes to run the test suite?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants