Skip to content

Conversation

@zhuqi-lucas
Copy link
Collaborator

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 1 of sort pushdown optimization to improve TopK query performance. When a query requests data in reverse order of a Parquet file's natural ordering, the optimizer now enables reverse row group scanning, which allows early termination in TopK queries while keeping the Sort operator for correctness.

Key changes:

  • Adds enable_sort_pushdown configuration option (default: true)
  • Implements reverse row group scanning for Parquet files
  • Returns inexact ordering to enable TopK early termination benefits
  • Adds comprehensive test coverage across multiple file formats

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/source/user-guide/configs.md Documents new configuration options including enable_sort_pushdown, force_filter_selections, enable_ansi_mode, and hash join InList pushdown settings
datafusion/common/src/config.rs Adds enable_sort_pushdown configuration option with detailed documentation
datafusion/physical-optimizer/src/pushdown_sort.rs Implements the PushdownSort optimizer rule that detects SortExec nodes and attempts to push sort requirements down to data sources
datafusion/physical-plan/src/sort_pushdown.rs Defines SortOrderPushdownResult enum for communicating sort pushdown results (Exact, Inexact, Unsupported)
datafusion/physical-plan/src/execution_plan.rs Adds try_pushdown_sort trait method to ExecutionPlan for sort optimization
datafusion/datasource-parquet/src/source.rs Implements reverse row group scanning logic in ParquetSource with reverse_row_groups field
datafusion/datasource-parquet/src/sort.rs Implements reverse_row_selection function to adjust row selections for reversed row group order
datafusion/datasource-parquet/src/opener.rs Integrates reverse scanning into ParquetOpener using PreparedAccessPlan
datafusion/physical-expr-common/src/sort_expr.rs Adds is_reverse and is_reversed_sort_options helpers for detecting reversed orderings
datafusion/sqllogictest/test_files/*.slt Comprehensive SQL logic tests validating reverse scan behavior with various scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot removed the proto label Dec 17, 2025
@zhuqi-lucas zhuqi-lucas force-pushed the branch-51-reverse-row-group branch from fd45ae8 to 4ed0668 Compare December 18, 2025 03:45
zhuqi-lucas and others added 9 commits December 23, 2025 15:40
…#19557)

- Closes [apache#19535](apache#19535)

Reverse row selection should respect the row group index, this PR will
fix the issue.

Reverse row selection should respect the row group index, this PR will
fix the issue.

Yes

No

(cherry picked from commit 27de50d)
## Which issue does this PR close?

Add sorted data benchmark.

- Closes[ apache#18976](apache#18976)

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes, test results for reverse parquet PR, it's 30X faster than main
branch for sorted data:
apache#18817

```rust
     Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench clickbench --iterations 5 --path /Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet --queries-path /Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data --sorted-by EventTime --sort-order ASC -o /Users/zhuqi/arrow-datafusion/benchmarks/results/reverse_parquet/data_sorted_clickbench.json`
Running benchmarks with the following options: RunOpt { query: None, pushdown: false, common: CommonOpt { iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", memory_limit: None, sort_spill_reservation_bytes: None, debug: false }, path: "/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet", queries_path: "/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data", output_path: Some("/Users/zhuqi/arrow-datafusion/benchmarks/results/reverse_parquet/data_sorted_clickbench.json"), sorted_by: Some("EventTime"), sort_order: "ASC" }
⚠️  Forcing target_partitions=1 to preserve sort order
⚠️  (Because we want to get the pure performance benefit of sorted data to compare)
📊 Session config target_partitions: 1
Registering table with sort order: EventTime ASC
Executing: CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION '/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet' WITH ORDER ("EventTime" ASC)
Q0: -- Must set for ClickBench hits_partitioned dataset. See apache#16591
-- set datafusion.execution.parquet.binary_as_string = true
SELECT * FROM hits ORDER BY "EventTime" DESC limit 10;

Query 0 iteration 0 took 14.7 ms and returned 10 rows
Query 0 iteration 1 took 10.2 ms and returned 10 rows
Query 0 iteration 2 took 8.7 ms and returned 10 rows
Query 0 iteration 3 took 7.9 ms and returned 10 rows
Query 0 iteration 4 took 7.9 ms and returned 10 rows
Query 0 avg time: 9.85 ms
+ set +x
Done
```

And the main branch result:

```rust
     Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench clickbench --iterations 5 --path /Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet --queries-path /Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data --sorted-by EventTime --sort-order ASC -o /Users/zhuqi/arrow-datafusion/benchmarks/results/issue_18976/data_sorted_clickbench.json`
Running benchmarks with the following options: RunOpt { query: None, pushdown: false, common: CommonOpt { iterations: 5, partitions: None, batch_size: None, mem_pool_type: "fair", memory_limit: None, sort_spill_reservation_bytes: None, debug: false }, path: "/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet", queries_path: "/Users/zhuqi/arrow-datafusion/benchmarks/queries/clickbench/queries/sorted_data", output_path: Some("/Users/zhuqi/arrow-datafusion/benchmarks/results/issue_18976/data_sorted_clickbench.json"), sorted_by: Some("EventTime"), sort_order: "ASC" }
⚠️  Forcing target_partitions=1 to preserve sort order
⚠️  (Because we want to get the pure performance benefit of sorted data to compare)
📊 Session config target_partitions: 1
Registering table with sort order: EventTime ASC
Executing: CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION '/Users/zhuqi/arrow-datafusion/benchmarks/data/hits_0_sorted.parquet' WITH ORDER ("EventTime" ASC)
Q0: -- Must set for ClickBench hits_partitioned dataset. See apache#16591
-- set datafusion.execution.parquet.binary_as_string = true
SELECT * FROM hits ORDER BY "EventTime" DESC limit 10;

Query 0 iteration 0 took 331.1 ms and returned 10 rows
Query 0 iteration 1 took 286.0 ms and returned 10 rows
Query 0 iteration 2 took 283.3 ms and returned 10 rows
Query 0 iteration 3 took 283.8 ms and returned 10 rows
Query 0 iteration 4 took 286.5 ms and returned 10 rows
Query 0 avg time: 294.13 ms
+ set +x
Done
```

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Yongting You <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
(cherry picked from commit cde6dfa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants