[Spark] Restrict partition-like data filters to whitelist of known-good expressions #3872
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Currently, we try to rewrite any arbitrary expression as partition-like. To avoid having to repeatedly remove known-bad expressions, start with a whitelist (to be expanded) of known-good expressions that can safely be rewritten.
This change will fix an existing issue where partition-like filters are generated for a non-skipping eligible column. This partition-like filter will throw an analysis exception because these referenced columns aren't found in the stats. This issue was originally missed (and is a difference in behavior vs. partition filters) because partitioning isn't allowed on non-atomic types (or string types), so we missed adding this additional match.
How was this patch tested?
See test changes.
Does this PR introduce any user-facing changes?
No.