Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Restrict partition-like data filters to whitelist of known-good expressions #3872

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

chirag-s-db
Copy link
Contributor

@chirag-s-db chirag-s-db commented Nov 12, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Currently, we try to rewrite any arbitrary expression as partition-like. To avoid having to repeatedly remove known-bad expressions, start with a whitelist (to be expanded) of known-good expressions that can safely be rewritten.

This change will fix an existing issue where partition-like filters are generated for a non-skipping eligible column. This partition-like filter will throw an analysis exception because these referenced columns aren't found in the stats. This issue was originally missed (and is a difference in behavior vs. partition filters) because partitioning isn't allowed on non-atomic types (or string types), so we missed adding this additional match.

How was this patch tested?

See test changes.

Does this PR introduce any user-facing changes?

No.

@chirag-s-db chirag-s-db changed the title [Spark] Don't apply partition-like data filters to ineligible columns [Spark] Restrict partition-like data filters to whitelist of known-good expressions Nov 22, 2024
@chirag-s-db
Copy link
Contributor Author

@scovich Could you take a look at this PR? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant