Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Does AWS Redshift Spectrum fully support Hudi meta table features like column stats and bloom filters? #12674

Open
Ssv-21 opened this issue Jan 20, 2025 · 1 comment

Comments

@Ssv-21
Copy link

Ssv-21 commented Jan 20, 2025

I’m trying to understand how well AWS Redshift Spectrum supports data skipping when working with Hudi datasets.

From the Athena documentation, it’s clear that Athena supports file listing from Hudi’s meta table but doesn’t utilize features like column statistics or bloom filters for data skipping. However, the official docs do not mention this for Redshift Spectrum.

Does anyone know if Redshift Spectrum, Supports column statistics and bloom filters from Hudi’s meta table for query optimization?
or it Works the same as Athena by only supporting file listing?

Any clarification or references would be really helpful. Thanks!

@rangareddy
Copy link

Hi @Ssv-21

The following documentation for Redshift Spectrum has been located. However, it does not provide information regarding the feature's support status. We will therefore need to contact the AWS product team for further clarification.

From onehouse blog,

"Hudi supports snapshot isolation, which means you can query data without picking up any in-progress or not-yet-committed changes. This means you can write changes to the dataset in EMR at the same time users are querying it with Athena and Amazon Redshift Spectrum."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting Triage
Development

No branches or pull requests

3 participants