idea: Add parquet arrow async write/read support #4868

Xuanwo · 2024-07-08T08:08:50Z

We can implement native parquet arrow support to make our users happy:

By implementing this integration, users can avoid using the low-performance AsyncRead/AsyncWrite trait or creating their own shims.

Steps:

The text was updated successfully, but these errors were encountered:

WenyXu · 2024-08-02T03:40:39Z

Interesting, I'm going to have a try🤩

Xuanwo · 2024-08-02T03:44:06Z

Interesting, I'm going to have a try🤩

Perfect!

Xuanwo · 2024-08-02T03:44:49Z

Perhaps we could have an integrations/parquet.

WenyXu · 2024-08-03T09:20:58Z

I'm considering introducing multiple versions of parquet via

[dependencies]
parquet_51 = { package = "parquet", version = "51.0", optional: true }
parquet_52 = { package = "parquet", version = "52.0", optional: true }

Due to we are still using parquet 51.0.0 🥲. Would you like to have any ideas? cc @waynexia

Xuanwo · 2024-08-03T10:45:17Z

Since arrow's release going to be stable, maybe we can keep track with upstream instead?

WenyXu · 2024-08-03T11:27:17Z

Since arrow's release going to be stable, maybe we can keep track with upstream instead?

Make sense🥹

WenyXu · 2024-08-05T17:35:56Z

For AsyncFileReader, I am considering introducing a feature to merge small ranges into a large chunk(e.g., https://github.com/datafuselabs/databend/blob/a98335d33e7abfd34189e7f32c06ab34d53c64d0/src/query/storages/fuse/src/io/read/block/block_reader_merge_io_async.rs#L46-L68). Would you like to have any ideas?

waynexia · 2024-08-05T18:12:04Z

Depends on multiple versions of parquet may bring lots of burdens on maintenance, especially when the versions increase. This is usually achieved via maintaining multiple branches and are released separately to migrate the API change of parquet, which doesn't seems to match our situation at present neither. So I prefer not to have this at least at beginning. We can reconsider this when there's a real requirement like a specific version of parquet is LTS and used widely.

Xuanwo · 2024-08-06T06:15:38Z

For AsyncFileReader, I am considering introducing a feature to merge small ranges into a large chunk(e.g., datafuselabs/databend@a98335d/src/query/storages/fuse/src/io/read/block/block_reader_merge_io_async.rs#L46-L68). Would you like to have any ideas?

Seems we can use Reader::fetch internally.

WenyXu · 2024-08-06T06:54:46Z

Seems we can use Reader::fetch internally.

Cool!

WenyXu self-assigned this Aug 3, 2024

WenyXu mentioned this issue Aug 3, 2024

feat: introduce opendal AsyncWriter for parquet integrations #4958

Merged

WenyXu mentioned this issue Aug 5, 2024

idea: Add orc-rust read support #4962

Closed

1 task

This was referenced Aug 6, 2024

feat: introduce opendal AsyncReader for parquet integrations #4972

Merged

chore(integrations/parquet): add README #4980

Merged

Xuanwo closed this as completed in #4980 Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: Add parquet arrow async write/read support #4868

idea: Add parquet arrow async write/read support #4868

Xuanwo commented Jul 8, 2024 •

edited by WenyXu

Loading

WenyXu commented Aug 2, 2024

Xuanwo commented Aug 2, 2024

Xuanwo commented Aug 2, 2024 •

edited

Loading

WenyXu commented Aug 3, 2024

Xuanwo commented Aug 3, 2024

WenyXu commented Aug 3, 2024

WenyXu commented Aug 5, 2024 •

edited

Loading

waynexia commented Aug 5, 2024

Xuanwo commented Aug 6, 2024

WenyXu commented Aug 6, 2024

idea: Add parquet arrow async write/read support #4868

idea: Add parquet arrow async write/read support #4868

Comments

Xuanwo commented Jul 8, 2024 • edited by WenyXu Loading

Steps:

WenyXu commented Aug 2, 2024

Xuanwo commented Aug 2, 2024

Xuanwo commented Aug 2, 2024 • edited Loading

WenyXu commented Aug 3, 2024

Xuanwo commented Aug 3, 2024

WenyXu commented Aug 3, 2024

WenyXu commented Aug 5, 2024 • edited Loading

waynexia commented Aug 5, 2024

Xuanwo commented Aug 6, 2024

WenyXu commented Aug 6, 2024

Xuanwo commented Jul 8, 2024 •

edited by WenyXu

Loading

Xuanwo commented Aug 2, 2024 •

edited

Loading

WenyXu commented Aug 5, 2024 •

edited

Loading