[FEA] [FOLLOW-UP] [Hybrid/C2C] Validate predicate push down and filtering #11892

res-life · 2024-12-19T01:14:07Z

Is your feature request related to a problem? Please describe.
It's from #11720 (comment)

Can we add some tests to validate that predicate push down and
filtering is working correctly? It would be nice to have

simple filters

complex filters that are not supported by normal parquet predicate push down. (like the ors at the top level instead of ands)

filters that have operators in them that velox does not support, but spark rapids does.

Describe the solution you'd like
First test predicate push down and filtering.
Then add support/fix for the failed cases.

Additional context
It's related to Hybrid/C2C feature.

res-life · 2024-12-19T01:15:16Z

@thirtiseven I remember you have a fix related to this, please clarify it, thanks.

thirtiseven · 2024-12-19T07:01:04Z

In 11720, Scan followed by a Filter will lead to a case that all conditions being pushed down to the Scan but still remaining in the Filter at the same time, so the filter conditions are evaluated twice. Typically the second evaluation is quite fast so it won't be a big problem (but we still want to remove it). But if there are some conditions that are not supported by Velox or Rapids it will cause some problems.

So for this issue, we need to check the following cases:

If a filter condition is not supported by either Velox or Rapids, we should fallback to the CPU somewhere (I think it should be handled by the Filter and not pushed down).
If a filter condition is only supported by Velox, we should push it down to the Scan. For current code it will lead to a unnecessary fallback while the values are already filtered by the Scan.
If a filter condition is only supported by Rapids, we should not push it down to the Scan and let Rapids handle it. In the current code I believe it will lead to some kind of exception because we don't do fallback on the Velox side.
If all conditions are pushed down to the Scan, we should just remove the GpuFilter node to avoid the double evaluation.

We already have a POC version of this in customer enviroment, but it was quite hack and have some hardcode. We can find a better solution to check if Velox supports a condition and make the plan change a rule.

res-life added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 19, 2024

res-life mentioned this issue Dec 19, 2024

Introduce hybrid (CPU) scan for Parquet read #11720

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] [FOLLOW-UP] [Hybrid/C2C] Validate predicate push down and filtering #11892

[FEA] [FOLLOW-UP] [Hybrid/C2C] Validate predicate push down and filtering #11892

res-life commented Dec 19, 2024 •

edited

Loading

res-life commented Dec 19, 2024

thirtiseven commented Dec 19, 2024

[FEA] [FOLLOW-UP] [Hybrid/C2C] Validate predicate push down and filtering #11892

[FEA] [FOLLOW-UP] [Hybrid/C2C] Validate predicate push down and filtering #11892

Comments

res-life commented Dec 19, 2024 • edited Loading

res-life commented Dec 19, 2024

thirtiseven commented Dec 19, 2024

res-life commented Dec 19, 2024 •

edited

Loading