Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the null value in bloom_filter_agg Spark aggregate function #458

Open
wants to merge 28 commits into
base: update
Choose a base branch
from

Conversation

weixiuli
Copy link

Currently, the velox BloomFilterAggregate checks the input row and throws an exception if there are some null values in the row. So we need to be consistent with spark's behavior and ignore null values.

The spark BloomFilterAggregate will Ignore null values. https://github.com/apache/spark/blob/6cdca10f148433664b3e2be6f655b0ddba817537/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/BloomFilterAggregate.scala#L180-L188

 override def update(buffer: BloomFilter, inputRow: InternalRow): BloomFilter = {
    val value = child.eval(inputRow)
    // Ignore null values.
    if (value == null) {
      return buffer
    }
    updater.update(buffer, value)
    buffer
  }

weixiuli pushed a commit to weixiuli/gluten that referenced this pull request Dec 12, 2023
@zhztheplayer zhztheplayer force-pushed the update branch 2 times, most recently from 13e79b6 to 8a6ef2b Compare December 13, 2023 07:11
@weixiuli weixiuli changed the title Support the null values in bloom_filter Spark aggregate Support the null value in bloom_filter_agg Spark aggregate function Dec 14, 2023
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 6 times, most recently from c2655fd to 31ae361 Compare February 5, 2025 23:07
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 6 times, most recently from 4c51741 to c87c6a1 Compare February 12, 2025 23:08
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 5 times, most recently from 7ec05f6 to 1d09144 Compare February 17, 2025 23:08
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 6 times, most recently from f8fbfbf to 13a82ba Compare February 24, 2025 23:08
@GlutenPerfBot GlutenPerfBot force-pushed the update branch 5 times, most recently from 9dcbeb6 to 05ebf28 Compare March 6, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants