Start parsing the `chunks` file with serde #31

Swatinem · 2024-09-03T10:52:24Z

This implements a hand-written parser which scans through the chunks file line-by-line, and parses the various headers and line records with serde.

The most complex part here is parsing the line records. If that complexity starts to be unreasonable, a hybrid approach is also possible in which the hand-written parser is used along with the simpler serde-based header parsers, and still falling back to the existing parser-combinator based parser for the line records.

I get the following timings here, which say that the parser thus far seems to be 10-20x faster than the previous one. But admittedly, it is still incomplete.

pyreport                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ complex_chunks        4.134 s       │ 4.134 s       │ 4.134 s       │ 4.134 s       │ 1       │ 1
├─ complex_chunks_serde  267.6 ms      │ 302.4 ms      │ 276.6 ms      │ 280.2 ms      │ 10      │ 10
├─ simple_chunks         10.49 µs      │ 36.88 µs      │ 10.56 µs      │ 11.5 µs       │ 100     │ 100
╰─ simple_chunks_serde   1.146 µs      │ 6.794 µs      │ 1.178 µs      │ 1.304 µs      │ 100     │ 400

codecov · 2024-09-03T10:54:48Z

Codecov Report

Attention: Patch coverage is 55.42636% with 115 lines in your changes missing coverage. Please review.

Project coverage is 97.07%. Comparing base (79dfe5a) to head (12ee764).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
core/src/parsers/pyreport/chunks_serde.rs	58.13%	103 Missing ⚠️
core/src/report/pyreport/types.rs	0.00%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
- Coverage   98.62%   97.07%   -1.55%     
==========================================
  Files          20       21       +1     
  Lines        6962     7220     +258     
==========================================
+ Hits         6866     7009     +143     
- Misses         96      211     +115

Components	Coverage Δ
core	`97.07% <55.42%> (-1.55%)`	⬇️
bindings	`100.00% <ø> (ø)`
python	`100.00% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This implements a hand-written parser which scans through the `chunks` file line-by-line, and parses the various headers and line records with serde. The most complex part here is parsing the line records. If that complexity starts to be unreasonable, a hybrid approach is also possible in which the hand-written parser is used along with the simpler serde-based `header` parsers, and still falling back to the existing parser-combinator based parser for the line records.

…face

Swatinem self-assigned this Sep 3, 2024

Base automatically changed from swatinem/bench-chunks to main September 4, 2024 07:19

Swatinem force-pushed the swatinem/parse-chunks-serde branch 3 times, most recently from 56e60a1 to e0dd890 Compare September 4, 2024 10:06

Swatinem added 3 commits September 18, 2024 11:35

Use memchr-based splitting instead of an iterator/event-based inter…

816d632

…face

get closer to the existing parser interface dealing with report builders

bd18f58

Swatinem force-pushed the swatinem/parse-chunks-serde branch from 12ee764 to bd18f58 Compare September 18, 2024 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start parsing the `chunks` file with serde #31

Start parsing the `chunks` file with serde #31

Swatinem commented Sep 3, 2024

codecov bot commented Sep 3, 2024 •

edited

Loading

Start parsing the chunks file with serde #31

Are you sure you want to change the base?

Start parsing the chunks file with serde #31

Conversation

Swatinem commented Sep 3, 2024

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

Start parsing the `chunks` file with serde #31

Start parsing the `chunks` file with serde #31

codecov bot commented Sep 3, 2024 •

edited

Loading