Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NativeScan name as filter for Gluten #694

Closed
wants to merge 2 commits into from

Conversation

wjxiz1992
Copy link
Collaborator

@wjxiz1992 wjxiz1992 commented Dec 15, 2023

When parsing eventlog generated by Gluten engine, the Scan node name starts with "NativeScan" instead of "Scan". This results the empty output after filtering.

Add the "NativeScan" for filter so it can be parsed successfully.

@@ -293,9 +293,10 @@ abstract class AppBase(
val readSchema = ReadParser.formatSchemaStr(meta.getOrElse("ReadSchema", ""))
val scanNode = allNodes.filter(node => {
// Get ReadSchema of each Node and sanitize it for comparison
// The "NativeScan" name is from Velox engine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very easy to be changed. Better to think of a better way for this. I realize this can be as a follow-up.

Signed-off-by: Allen Xu <[email protected]>
@wjxiz1992 wjxiz1992 changed the title Add NativeScan name as filter for Velox Add NativeScan name as filter for Gluten Dec 15, 2023
@tgravescs
Copy link
Collaborator

The intent is to be able to parse Velox plans and NativeScan is the only thing we don't support? ie everything else parses out fine. We should be added a test so if you have an eventlog that has this we should figure out where to add that.

@amahussein amahussein added feature request New feature or request core_tools Scope the core module (scala) labels Dec 15, 2023
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wjxiz1992 for putting up the PR.

  • the PR is missing unit tests
  • It is preferred for each PR to have an opened issue that describes the bug/feature-request so that it gets reviewed and prioritized along the other features.

cc @mattahrens Do we want to support nativeScan?

@mattahrens
Copy link
Collaborator

I'm missing the context for this. Is there a corresponding issue?

If this is to resolve a parsing failure to not fail for qual tool execution, then this is fine. However note that we have not benchmarked against Velox so any estimations out of the qual tool would likely not be as accurate.

@wjxiz1992
Copy link
Collaborator Author

Yes, this MR is just a quick fix for a customer issue and it needs more code to really handle the problem:

  1. potential unknown plan for either Glueten or other 3rd party Planner, better to add an exception when any unknown plan comes in.
  2. unit tests requires an eventlog, I need to run a query to get that for unit test, instead of using the one from customer.

I will polish this afterwards.

@amahussein
Copy link
Collaborator

I can include that change in the existing PR #709
The point that I suspect that there are other changes needed to be done so that the DataSource information is complete. Those changes are part of the WIP #709.

@wjxiz1992 Can you show a snippet from eventlog that shows the nodeDescription containing NativeScan? This will help to know if the parser will be able to extract the ReadSchema/Format from the node.

@wjxiz1992
Copy link
Collaborator Author

@amahussein I am afraid that may contain customer sensitive info, let me send you the snippet in private.

@amahussein
Copy link
Collaborator

#723 supersedes this PR.

@amahussein amahussein closed this Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants