[FLINK-39677][table] Fix ARRAY_SORT comparator contract violation#28159
Open
jnh5y wants to merge 2 commits into
Open
[FLINK-39677][table] Fix ARRAY_SORT comparator contract violation#28159jnh5y wants to merge 2 commits into
jnh5y wants to merge 2 commits into
Conversation
…th duplicates Reproduces "Comparison method violates its general contract!" at runtime when an array contains many equal elements. The existing ARRAY_SORT cases are all below TimSort's MIN_MERGE threshold (32) and contain at most one duplicate, so they exercise only the binarySort path and never trip the contract check. The new cases use 64-element BIGINT arrays so that the merge path runs: - all-equal (every element 42L) - many-duplicates (four values, 16 occurrences each, interleaved) Generated-by: Claude (Opus 4.7)
The previous comparator was built from a single SQL > evaluator and
returned +1 or -1, never 0 - so for equal elements
compare(a,b) == compare(b,a) == -1, violating antisymmetry and tripping
TimSort's contract check once an array is large enough to enter the
merge path:
java.lang.IllegalArgumentException: Comparison method violates its
general contract!
at java.util.TimSort.mergeHi(TimSort.java:903)
...
at ArraySortFunction.eval(ArraySortFunction.java:91)
Introduce an internal-only $COMPARE$1 function (analogous to the
existing $HASHCODE$1) that returns a -1/0/+1 int and delegates its
codegen to GenerateUtils.generateCompare - the same per-type compare
helper ORDER BY already uses. ArraySortFunction now constructs a single
INT-returning evaluator and invokes the resulting MethodHandle once per
pair instead of two boolean probes.
Benefits over a two-probe fix:
- one MethodHandle invocation per compare instead of one-or-two
- reuses existing per-type compare codegen (CHAR/VARCHAR/DECIMAL/
TIMESTAMP via compareTo, primitives via direct compare, ROW/ARRAY
recursive, RAW via Comparable)
- reusable for future array functions that need ordering
Generated-by: Claude (Opus 4.7)
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The comparator built from a single SQL > probe returned +1 or -1 and never 0, so for equal elements compare(a,b) == compare(b,a) == -1. That violates antisymmetry and trips TimSort's contract check once an array is large enough to take the merge path (>= 32 elements with duplicates):
To fix this, we introduce an internal-only $COMPARE$1 function (analogous to the existing $HASHCODE$1) that returns a -1/0/+1 int and delegates its codegen to GenerateUtils.generateCompare - the same per-type compare helper ORDER BY already uses. ArraySortFunction now constructs a single INT-returning evaluator.
Coverage: a new 64-element BIGINT case in CollectionFunctionsITCase exercises the TimSort merge path with duplicates.
Generated-by: Claude (Opus 4.7)
Verifying this change
This change added tests which fails without the fix.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation
Was generative AI tooling used to co-author this PR?
Generated-by: Claude (Opus 4.7)