-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[TEST] Filter pushdown dynamic #19694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR implements cross-file tracking of filter selectivity in ParquetSource to adaptively reorder and demote low-selectivity filters. Filters that don't filter enough rows (configurable, default 80% threshold) are demoted from row-level pushdown to post-scan inline application, reducing I/O overhead. Key changes: - Add SelectivityTracker to track filter effectiveness across files - ExprKey wrapper enables HashMap keying by PhysicalExpr structural equality - Each ParquetOpener queries shared stats to partition filters into: - Row filters (push down): filters with effectiveness >= threshold or unknown - Post-scan filters: filters with effectiveness < threshold - Post-scan filters are added to projection, applied inline in stream, then filter columns are removed from output - SelectivityUpdatingStream updates tracker when stream completes - build_row_filter_with_metrics() returns per-filter metrics for tracking - Filters are reordered by observed effectiveness (most selective first) Configuration: - `parquet_options.filter_effectiveness_threshold` (default: 0.8) - Effectiveness = 1 - (rows_matched / rows_total) = fraction filtered out 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix clippy error: Use datafusion_common::instant::Instant instead of std::time::Instant for WASM compatibility (opener.rs:706) - Fix parquet test failures: Set filter_effectiveness_threshold to 0.0 in test helper when pushdown_predicate is enabled. This ensures filters are pushed down immediately rather than waiting for adaptive selectivity learning. - Fix filter_pushdown_view test: Disable pushdown_filters in test config so filters stay as FilterExec nodes rather than being pushed into the Parquet reader. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
run benchmark tpch |
|
🤖 |
This reverts commit 72b078a.
|
run benchmark tpch |
|
🤖: Benchmark completed Details
|
|
run benchmark tpch |
|
🤖 |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
This reverts commit 84346da.
|
run benchmark tpch |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
This reverts commit d26ceb0.
|
run benchmark |
|
🤖 Hi @Dandandan, thanks for the request (#19694 (comment)).
Please choose one or more of these with |
|
run benchmark tpch |
|
run benchmarks |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?