- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
Fix: spark bit_count function #18322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: spark bit_count function #18322
Conversation
| } | ||
|  | ||
| // Here’s the equivalent Rust implementation of the bitCount function (similar to Apache Spark's bitCount for LongType) | ||
| fn bit_count(i: i64) -> i32 { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we enhance the comment here with this link + potentially link to original source code this actual function is copied from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where was the actual bitcount implementation code pulled from though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional link to Java implementation of bitCount function
| let value_array = value_array[0].as_ref(); | ||
| match value_array.data_type() { | ||
| DataType::Int8 => { | ||
| DataType::Int8 | DataType::Boolean => { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark supports only signed int types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems misplaced as the code adds support for boolean 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this code will now panic if you pass in a Boolean array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to note that Spark only supports signed integer and boolean types as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an additional test for BooleanArray.
Which issue does this PR close?
Closes #18225
Rationale for this change
After adding the bit_count function in Comet, we got different results from Spark. (apache/datafusion-comet#2553)
Are these changes tested?
Tested with existing unit tests