-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parquet): Adopt new parquet-wasm File reader API. #2987
Conversation
56e0480
to
3970312
Compare
|
||
let parquetFile: parquetWasm.ParquetFile; | ||
if (file.handle instanceof Blob) { | ||
// TODO - let's assume fromFile() works on Blobs and not just on File... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll publish a patch with this fix: kylebarron/parquet-wasm#530
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Published 0.6.1 where this uses the Blob type: https://kylebarron.dev/parquet-wasm/classes/esm_parquet_wasm.ParquetFile.html#fromFile
|
||
const ipcStream = wasmTable.intoIPCStream(); | ||
const arrowTable = arrow.tableFromIPC(ipcStream); | ||
const stream: ReadableStream<arrow.Table> = await parquetFile.stream(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a stream of wasm RecordBatch objects. You need to manage the movement to JS for each batch yourself. You can use the intoIPCStream
method plus arrow.tableFromIPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a stream of wasm RecordBatch objects. You need to manage the movement to JS for each batch yourself.
Thanks, looks very promising indeed, will come back to this in separate PR as integrating it will need a bit more plumbing in loaders.gl
Nit: Any chance you could add a method that returns an async iterator so that I don't have to wrap things? Generally the Stream callback API is more fiddly to work with and AFAIK, stream incompatibilities between Node and browser are still a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that looks annoying.
Right now I'm using the wasm-streams library to return a ReadableStream
. It doesn't look like there's any built-in support for returning an async iterator.
I have a note for myself (kylebarron/parquet-wasm#307) to point users to a polyfill. In testing https://observablehq.com/d/f5723cea6661fb71 a long time ago, I used this minimal polyfill, but I don't really understand what's happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I suspected you were relying on some wasm support library. The polyfill you copies is similar to but cleaner than the one I added: https://github.com/visgl/loaders.gl/blob/master/modules/parquet/src/lib/utils/make-stream-iterator.ts.
but I don't really understand what's happening
Yes that is the problem - converting between streams and iterators is way too obscure for most users.
- At least Node streams define
[Symbol.asyncIterator]
and can be used as iterators, but the browser WhatWG streams can not.
3970312
to
9e89811
Compare
No description provided.