Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

[WIP] Bloom filter Quick Implementation #363

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

thugsatbay
Copy link
Contributor

What is the context for this pull request?

  • Tracking Issue: If you expect any subjective discussions around this pull request, please consider opening a tracking issue and link to the PR. Write N/A, if this pull request is self-contained.
  • Parent Issue: Link to the issue that captures the overall plan. Write N/A, if this is a stand-alone pull request with a tracking issue OR self-contained pull request.
  • Dependencies: Links to issues you depend on for this pull request to work. Write N/A, if no dependencies.
    • Issue 1
    • Issue 2

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

@sezruby
Copy link
Collaborator

sezruby commented Mar 23, 2021

@thugsatbay Could you add BFFilterIndexRule to apply BF index & measure the performance? (for e2e prototype)

I wonder file-level BF & filtering files would be beneficial if we use parquet dictionary filter.
https://medium.com/analytics-vidhya/spark-parquet-file-cac4af92981d
https://www.slideshare.net/RyanBlue3/parquet-performance-tuning-the-missing-guide

So .. it might be good to measure the perf using dictionary filter & compare with BF..

cc @imback82

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants