Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Co-authored-by: Nathan Daly <[email protected]>
  • Loading branch information
Drvi and NHDaly committed Nov 7, 2023
1 parent a2ef933 commit bea8154
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ChunkedBase.jl

The package handles ingestion of data chunks and the distribution & synchronization of work that happens on these chunks in parallel. It came to existence while refactoring the `ChunkedCSV.jl` and `ChunkedJSON.jl` packages and was designed to by extended by packages like these. It is a package used to write parser packages.
The package handles ingestion of data chunks and the distribution & synchronization of work that happens on these chunks in parallel. It came to existence while refactoring the `ChunkedCSV.jl` and `ChunkedJSON.jl` packages and was designed to be extended by packages like these. It is a package used to write parser packages.

Specifically, `ChunkedBase.jl` spawns one task which handles IO and behaves as a coordinator, and a configurable number of worker tasks.
Both CSV and JSONL are textual formats which delimit records by newlines which makes newlines an ideal point to distribute work. One the coordinator task we ingest bytes into preallocated buffer and then use `NewlineLexers.jl` package to quickly find newlines in it. In turn these newlines are distributed among worker tasks and the coordinator immediately starts working a secondary buffer, while the first one is being processed by the workers.
Expand Down

0 comments on commit bea8154

Please sign in to comment.