Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Look into adding an AST Filter, turning on AST project by default, and partial AST projection??? #8954

Open
revans2 opened this issue Aug 8, 2023 · 2 comments
Labels
feature request New feature or request performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Aug 8, 2023

Is your feature request related to a problem? Please describe.
Right now AST is really only used in joins. We added in an AST Project as a way to easily test AST versions of various expressions, but no one has really done any serious benchmark to find out how good it might be. Some recent experimental work has indicated that for some operations a filter that could do AST operations might be very highly performant compared to a non-AST version. We should take the time to explore what cases the AST operations are better than non-AST operations, and which cases they are not. If we see some big gains we should work with CUDF to add in AST filter support, and either turn on AST Project in cases where it would be best, or possibly update tiered project so that it could select some tiers to be executed using AST and not others.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Aug 8, 2023
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Aug 10, 2023
@mattahrens
Copy link
Collaborator

We can start by experimenting with AST project config turned on.

@revans2
Copy link
Collaborator Author

revans2 commented Dec 18, 2024

As a note on why this might be a big win. #11810 needed to do a special case where multiple logical OR operators needed to be nested together. Using AST sped up the processing of this significantly. The AST or processing dropped the end to end time almost as much as the multi-contains work did.

If this works out we could eventually rewrite the complex rlike expression into a set of contains with OR expressions, which then we could automatically combine the into a multi-contains and an ast OR for the results. This would let us, in theory combine multiple complex rlike expressions or similar processing into a single multi-contains with some AST post processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

2 participants