Skip to content

Conversation

@cfmcgrady
Copy link
Contributor

@cfmcgrady cfmcgrady commented Oct 24, 2025

Which issue does this PR close?

Closes #2612.

Rationale for this change

What changes are included in this PR?

  • Fixes null-handling across several vector getters to prevent incorrect buffer reads and ensure Spark-compatible semantics.
  • Reworks array_insert to align with Arrow’s half-open offsets and Spark’s positive/negative index behavior.

How are these changes tested?

Added UT.

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.25%. Comparing base (f09f8af) to head (aa7287b).
⚠️ Report is 649 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/apache/comet/vector/CometPlainVector.java 0.00% 1 Missing and 1 partial ⚠️
.../java/org/apache/comet/vector/CometListVector.java 0.00% 1 Missing ⚠️
...n/java/org/apache/comet/vector/CometMapVector.java 0.00% 1 Missing ⚠️
...main/java/org/apache/comet/vector/CometVector.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2643      +/-   ##
============================================
+ Coverage     56.12%   59.25%   +3.12%     
- Complexity      976     1449     +473     
============================================
  Files           119      147      +28     
  Lines         11743    13761    +2018     
  Branches       2251     2367     +116     
============================================
+ Hits           6591     8154    +1563     
- Misses         4012     4383     +371     
- Partials       1140     1224      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cfmcgrady cfmcgrady changed the title [WIP]Fix: Return null for null entries in CometPlainVector.getBinary Fix: Return null for null entries in CometPlainVector#getBinary Oct 24, 2025
@parthchandra
Copy link
Contributor

CometPlainVector does not check for null for any of the other types either. In general, from what I can see, the pattern has been for the caller to check if the value is null when needed. (Null checking is expensive and should be avoided if possible).

@cfmcgrady
Copy link
Contributor Author

hi @parthchandra

Null checking is expensive and should be avoided if possible

Yes, I completely agree that unnecessary null checking should be avoided for performance reasons.

the pattern has been for the caller to check if the value is null when needed

For ColumnVector’s getXXX methods, their documentation clearly specifies the null handling requirements for each method.
Spark has addressed this in apache/spark#20455, where the corresponding changes were made.

@cfmcgrady
Copy link
Contributor Author

The failing unit tests appear to be related to the array_insert implementation. I'm still investigating the root cause.

@parthchandra
Copy link
Contributor

hi @parthchandra

Null checking is expensive and should be avoided if possible

Yes, I completely agree that unnecessary null checking should be avoided for performance reasons.

the pattern has been for the caller to check if the value is null when needed

For ColumnVector’s getXXX methods, their documentation clearly specifies the null handling requirements for each method. Spark has addressed this in apache/spark#20455, where the corresponding changes were made.

I stand corrected. This does make the implementations consistent with Spark.

//
// This code is also based on the implementation of the array_insert from the Apache Spark
// https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L4713
// Implementation aligned with Arrow's half-open offset ranges and Spark semantics.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version fixed by ChatGPT :)

@cfmcgrady cfmcgrady changed the title Fix: Return null for null entries in CometPlainVector#getBinary Fix: Fix null handling in CometVector implementations Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scalar function reverse is not compatible with Spark

4 participants