Skip to content

[spark] Support nested fields in SparkFilterConverter#8399

Open
Mesut-Doner wants to merge 1 commit into
apache:masterfrom
Mesut-Doner:spark_nested_fields
Open

[spark] Support nested fields in SparkFilterConverter#8399
Mesut-Doner wants to merge 1 commit into
apache:masterfrom
Mesut-Doner:spark_nested_fields

Conversation

@Mesut-Doner

@Mesut-Doner Mesut-Doner commented Jun 30, 2026

Copy link
Copy Markdown

Purpose

Currently, SparkFilterConverter throws an UnsupportedOperationException when it encounters dot-separated nested field paths (e.g. a.b.c). This limits predicate pushdown capabilities in Spark when queries filter on nested Struct types.

This PR implements nested field support in SparkFilterConverter so that Spark V1 Filter objects on nested fields are correctly converted into Paimon Predicate structures:

Nested Schema Resolution: Added getNestedFieldType(...) and resolveField(...) to recursively walk the RowType schema along dot-separated path components to find the correct nested field's DataType.
Transform-based Predicate Conversion: Refactored the converter branches (EqualTo, In, IsNull, IsNotNull, GreaterThan, etc.) to use FieldTransform(FieldRef) and call the corresponding PredicateBuilder methods that accept Transform instead of index-based builders.
Literal Conversion: Updated convertLiteral(...) and convertString(...) to correctly resolve nested field paths and convert literals to their matching leaf data type.

Tests

Added a new test case testNestedField() in SparkFilterConverterTest to verify that nested struct field predicates are successfully converted to Paimon predicates.

@Mesut-Doner Mesut-Doner force-pushed the spark_nested_fields branch from 990ec18 to 85017e3 Compare June 30, 2026 15:39
@Mesut-Doner Mesut-Doner changed the title Spark nested fields [spark] Support nested fields in SparkFilterConverter Jun 30, 2026
}
}

Transform transform = new FieldTransform(new FieldRef(topLevelIndex, field, fieldType));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not actually evaluate the nested field. FieldTransform only reads InternalRowUtils.get(row, fieldRef.index(), fieldRef.type()), so for a filter like a.b = 1 this FieldRef still reads top-level column a (index 0) but with b type. That can make predicate.test(row) and statistics pruning read the struct column as an int/string instead of traversing into b. The new test only checks toString(), so it misses this runtime behavior. Please add a real nested-field transform/path traversal, or keep these filters unsupported until evaluation and stats pruning can handle nested paths correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants