Skip to content

ARROW-11973 [Rust][DataFusion] Boolean kleene kernels#9772

Closed
ch-sc wants to merge 6 commits intoapache:masterfrom
ch-sc:ARROW-11973-boolean-kleene-kernels
Closed

ARROW-11973 [Rust][DataFusion] Boolean kleene kernels#9772
ch-sc wants to merge 6 commits intoapache:masterfrom
ch-sc:ARROW-11973-boolean-kleene-kernels

Conversation

@ch-sc
Copy link
Copy Markdown
Contributor

@ch-sc ch-sc commented Mar 22, 2021

This PR adds two boolean kernels kleene_or and kleene_and.

As described in the corresponding JIRA ticket, the kleene operator handels null values differently compared to plain OR and AND operators.

Operator Left Right Result
AND TRUE NULL NULL
AND FALSE NULL FALSE
OR TRUE NULL TRUE
OR FALSE NULL NULL

@ch-sc ch-sc changed the title ARROW-11973 Boolean kleene kernels ARROW-11973 [Rust] Boolean kleene kernels Mar 22, 2021
@Dandandan
Copy link
Copy Markdown
Contributor

Dandandan commented Mar 22, 2021

@ch-sc thanks for your PR

In the JIRA ticket it is mentioned that these semantics are default for SQL - however - these return NULL in PostgreSQL (the dialect we want to target for DataFusion.

SELECT TRUE AND NULL;
SELECT TRUE OR NULL;

I think it is a useful addition to Arrow side though - for DataFusion we should just think about what behavior we want to support.

@ch-sc
Copy link
Copy Markdown
Contributor Author

ch-sc commented Mar 22, 2021

Hi @Dandandan, I actaully checked on Postgres (11) beforehand and it did return exactly the values I provided in the table.

SELECT TRUE AND NULL; --> NULL
SELECT FALSE AND NULL; --> FALSE
SELECT TRUE OR NULL; --> TRUE
SELECT FALSE OR NULL; --> NULL

Which Postgres version are you using?

@jorgecarleitao
Copy link
Copy Markdown
Member

I tried on mysql, postgres and big query and was happy with the table you provided on JIRA. I am also curious how I got a different result than @Dandandan xD

@github-actions
Copy link
Copy Markdown

@Dandandan
Copy link
Copy Markdown
Contributor

Dandandan commented Mar 22, 2021

Haha @ch-sc @jorgecarleitao I tried it on this website which I used before for some simple checks https://rextester.com/l/postgresql_online_compiler
but it shows null on every input now 🤣

So forget what I said, it is very useful to do this both for Arrow & DataFusion 👍

@ch-sc ch-sc changed the title ARROW-11973 [Rust] Boolean kleene kernels ARROW-11973 [Rust][DataFusion] Boolean kleene kernels Mar 23, 2021
@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #9772 (d024f2b) into master (29feea0) will decrease coverage by 0.23%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9772      +/-   ##
==========================================
- Coverage   82.59%   82.36%   -0.24%     
==========================================
  Files         248      252       +4     
  Lines       58294    58893     +599     
==========================================
+ Hits        48149    48507     +358     
- Misses      10145    10386     +241     
Impacted Files Coverage Δ
rust/arrow/src/array/transform/fixed_binary.rs 5.26% <0.00%> (-73.69%) ⬇️
rust/arrow/src/array/transform/null.rs 0.00% <0.00%> (-66.67%) ⬇️
rust/arrow/src/array/transform/structure.rs 44.44% <0.00%> (-38.89%) ⬇️
rust/arrow/src/array/transform/primitive.rs 71.42% <0.00%> (-28.58%) ⬇️
rust/arrow/src/array/transform/mod.rs 70.23% <0.00%> (-22.96%) ⬇️
rust/arrow/src/array/transform/variable_size.rs 87.87% <0.00%> (-12.13%) ⬇️
rust/arrow/src/array/array_binary.rs 79.84% <0.00%> (-11.80%) ⬇️
rust/parquet/src/arrow/schema.rs 86.57% <0.00%> (-5.79%) ⬇️
rust/arrow/src/buffer/mutable.rs 87.73% <0.00%> (-3.04%) ⬇️
rust/arrow/src/array/array_union.rs 87.17% <0.00%> (-2.37%) ⬇️
... and 58 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2a9f5d...d024f2b. Read the comment docs.

@ch-sc
Copy link
Copy Markdown
Contributor Author

ch-sc commented Mar 26, 2021

@Dandandan @jorgecarleitao the integration tests fail because of scarce disk space:

+ npm install
##[warning]You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 57 MB
npm WARN tar ENOSPC: no space left on device, write

Any ideas what we can do about this?

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ch-sc I'll plan to review this carefully tomorrow -- sorry I missed it with all the other things going on.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nicely done @ch-sc -- I went over the implementation and reviewed the tests carefully. 💯

Thank you for the contribution

@alamb alamb closed this in 9aa0f85 Mar 30, 2021
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 30, 2021

For the record the issue of the integration test running out of space is known and not related to this PR

alamb pushed a commit that referenced this pull request Apr 14, 2021
[PR 9772](#9772) introduced a bug. The boolean kleene kernel would not iterate over the bit chunks of the batch, if there is no validity bitmap on the left or the right input. It will only process the bits of the remainder word. The inital unit test didn't pick a large enough batch size to test this scenario, which is why this was not detected by the tests earlier.

Closes #9965 from ch-sc/ARROW-12294-boolean-kleene-kernels-no-remainder

Authored-by: Christoph Schulze <christoph.schulze@signavio.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb pushed a commit to apache/arrow-rs that referenced this pull request Apr 20, 2021
[PR 9772](apache/arrow#9772) introduced a bug. The boolean kleene kernel would not iterate over the bit chunks of the batch, if there is no validity bitmap on the left or the right input. It will only process the bits of the remainder word. The inital unit test didn't pick a large enough batch size to test this scenario, which is why this was not detected by the tests earlier.

Closes #9965 from ch-sc/ARROW-12294-boolean-kleene-kernels-no-remainder

Authored-by: Christoph Schulze <christoph.schulze@signavio.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.84946% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.36%. Comparing base (29feea0) to head (d024f2b).

Files with missing lines Patch % Lines
rust/arrow/src/compute/kernels/boolean.rs 96.96% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9772      +/-   ##
==========================================
- Coverage   82.59%   82.36%   -0.24%     
==========================================
  Files         248      252       +4     
  Lines       58294    58893     +599     
==========================================
+ Hits        48149    48507     +358     
- Misses      10145    10386     +241     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants