-
Notifications
You must be signed in to change notification settings - Fork 239
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support group-limit optimization for
ROW_NUMBER
(#11886)
* Support group-limit optimization for `ROW_NUMBER` Fixes #10505. This is a follow-up to #10500, which added support for WindowGroupLimit optimizations for `RANK` and `DENSE_RANK` window functions. The same optimization was not extended to `ROW_NUMBER` at that time. This commit now allows the output from `ROW_NUMBER` to be filtered map-side, in case there is a `<` predicate on its return value. The following is an example of the kind of query that is affected by this change: ```sql SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY p ORDER BY o) AS rn FROM mytable ) WHERE rn < 10; ``` This is per the optimization in [SPARK-37099](https://issues.apache.org/jira/browse/SPARK-37099) in Apache Spark. With this, the output from the window function could potentially be drastically smaller than the input, thus saving on shuffle traffic. Note that this optimization does not kick in on Apache Spark or in `spark-rapids` if the `ROW_NUMBER` phrase does not include a `PARTITION BY` clause. `spark-rapids` remains consistent with Apache Spark in this regard. Signed-off-by: MithunR <[email protected]>
- Loading branch information
Showing
2 changed files
with
12 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters