[spark] Support nested fields in SparkFilterConverter#8399
Open
Mesut-Doner wants to merge 1 commit into
Open
Conversation
990ec18 to
85017e3
Compare
JingsongLi
reviewed
Jul 1, 2026
| } | ||
| } | ||
|
|
||
| Transform transform = new FieldTransform(new FieldRef(topLevelIndex, field, fieldType)); |
Contributor
There was a problem hiding this comment.
This does not actually evaluate the nested field. FieldTransform only reads InternalRowUtils.get(row, fieldRef.index(), fieldRef.type()), so for a filter like a.b = 1 this FieldRef still reads top-level column a (index 0) but with b type. That can make predicate.test(row) and statistics pruning read the struct column as an int/string instead of traversing into b. The new test only checks toString(), so it misses this runtime behavior. Please add a real nested-field transform/path traversal, or keep these filters unsupported until evaluation and stats pruning can handle nested paths correctly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Currently, SparkFilterConverter throws an UnsupportedOperationException when it encounters dot-separated nested field paths (e.g. a.b.c). This limits predicate pushdown capabilities in Spark when queries filter on nested Struct types.
This PR implements nested field support in SparkFilterConverter so that Spark V1 Filter objects on nested fields are correctly converted into Paimon Predicate structures:
Nested Schema Resolution: Added getNestedFieldType(...) and resolveField(...) to recursively walk the RowType schema along dot-separated path components to find the correct nested field's DataType.
Transform-based Predicate Conversion: Refactored the converter branches (EqualTo, In, IsNull, IsNotNull, GreaterThan, etc.) to use FieldTransform(FieldRef) and call the corresponding PredicateBuilder methods that accept Transform instead of index-based builders.
Literal Conversion: Updated convertLiteral(...) and convertString(...) to correctly resolve nested field paths and convert literals to their matching leaf data type.
Tests
Added a new test case testNestedField() in SparkFilterConverterTest to verify that nested struct field predicates are successfully converted to Paimon predicates.