Skip to content

[flink] Add missing predicat converters in PredicateConverter#8395

Open
Mesut-Doner wants to merge 2 commits into
apache:masterfrom
Mesut-Doner:mdoner_flink_predicate_filters
Open

[flink] Add missing predicat converters in PredicateConverter#8395
Mesut-Doner wants to merge 2 commits into
apache:masterfrom
Mesut-Doner:mdoner_flink_predicate_filters

Conversation

@Mesut-Doner

Copy link
Copy Markdown

Purpose

implement the missing Flink predicate converters that were marked as TODO

  • NOT
  • IS_NOT_TRUE
  • NOT_BETWEEN
  • SIMILAR

Tests

- Fix typo: child.get(0) -> children.get(0) in IS_NOT_TRUE branch
- Fix missing dot before toString() in SIMILAR branch (compile error)
- Fix BinaryString: pass BinaryString.fromString(pattern) to builder.like
  instead of a raw String
- Rewrite convertSimilarToRegex as convertSimilarToLike: produce a SQL LIKE
  pattern (not a Java regex) so that the downstream Like function processes
  it correctly; use backslash as the output escape char to match Like's
  default escape convention
- Fix escape-sequence handling: escaped _ and % become \_ / \% (literals),
  escaped escape char becomes the literal char itself
- Throw UnsupportedExpression for SIMILAR TO-only features (character
  classes [...], alternation |, quantifiers * + ?, grouping ()) that have
  no SQL LIKE equivalent
- Add unit tests: testConvertSimilarToLike covers pattern pass-through,
  escape sequences and unsupported-feature rejection; testSimilarExpression*
  tests cover the full predicate path including row-level filtering
@Mesut-Doner Mesut-Doner force-pushed the mdoner_flink_predicate_filters branch from 41af127 to ac0069c Compare June 30, 2026 13:26
} else if (func == BuiltInFunctionDefinitions.IS_NOT_TRUE) {
FieldReferenceExpression fieldRefExpr =
extractFieldReference(children.get(0)).orElseThrow(UnsupportedExpression::new);
return builder.notEqual(builder.indexOf(fieldRefExpr.getName()), Boolean.TRUE);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the NULL semantics of IS NOT TRUE. In SQL, x IS NOT TRUE is true for both FALSE and NULL, but Paimon NotEqual returns false when the field value is null (PredicateTest.testNotEqual covers this). With this conversion, a filter like WHERE bool_col IS NOT TRUE can incorrectly prune rows where bool_col is NULL. This needs to be represented as isNull(field) OR equal(field, false) (or left unsupported) and should have a test for the NULL row case.

// SIMILAR TO wildcards are the same as SQL LIKE wildcards
like.append(c);
} else if (c == '[' || c == '|' || c == '(' || c == ')'
|| c == '*' || c == '+' || c == '?') {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unsupported-feature check is incomplete for SIMILAR TO. Calcite/Flink treat { and } as SIMILAR special characters as well (quantifiers, e.g. a{2}), but this conversion lets them pass through as literal LIKE characters. That means a real SIMILAR predicate such as col SIMILAR TO a{2} can be pushed down as a LIKE pattern matching the literal string a{2} instead of aa, which is incorrect filtering. Please reject all SIMILAR-only metacharacters that cannot be represented by Paimon LIKE, including { and } (and add a regression test for such a pattern).

Mesut-Doner added a commit to Mesut-Doner/paimon that referenced this pull request Jul 1, 2026
…ling

Address review feedback on apache#8395:
- IS_NOT_TRUE now returns true for NULL rows (isNull OR equal-false)
  instead of NotEqual, which incorrectly evaluated to false for NULL,
  causing rows to be wrongly pruned.
- convertSimilarToLike now rejects '{' and '}' as unsupported SIMILAR
  TO-only quantifier syntax (e.g. a{2}), preventing such patterns from
  being pushed down as literal LIKE matches.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…ling

Address review feedback on apache#8395:
- IS_NOT_TRUE now returns true for NULL rows (isNull OR equal-false)
  instead of NotEqual, which incorrectly evaluated to false for NULL,
  causing rows to be wrongly pruned.
- convertSimilarToLike now rejects '{' and '}' as unsupported SIMILAR
  TO-only quantifier syntax (e.g. a{2}), preventing such patterns from
  being pushed down as literal LIKE matches.
@Mesut-Doner Mesut-Doner force-pushed the mdoner_flink_predicate_filters branch from 7eea7a0 to 1788a61 Compare July 1, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants