Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why doesn't Parquet currently support writing multiple row groups simultaneously? #2929

Open
muyihao opened this issue Jun 24, 2024 · 1 comment

Comments

@muyihao
Copy link

muyihao commented Jun 24, 2024

Hi Parquet developers,

I have a question regarding the current implementation of Parquet. As far as I understand, Parquet does not support writing multiple row groups simultaneously. Could you please explain the reasoning behind this design choice?

Additionally, I am considering modifying Parquet to allow for multiple row groups to exist in memory and be flushed sequentially. From a high-level perspective, does this approach seem feasible? Are there any potential pitfalls or challenges I should be aware of?

Thank you for your time and assistance.

Best regards,

@wgtmac
Copy link
Member

wgtmac commented Jun 24, 2024

This would complicate the implementation and result in large memory footprint. Does it make sense to use multiple file writers instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants