Report all operators in the output file#1444
Merged
nartal1 merged 8 commits intoNVIDIA:devfrom Dec 4, 2024
Merged
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
amahussein
reviewed
Dec 4, 2024
amahussein
requested changes
Dec 4, 2024
Collaborator
amahussein
left a comment
There was a problem hiding this comment.
Can we add the fix the logic in looping on the graph nodes to build DSV1 that we discussed offline?
The filters in the code below should be swapped.
It should become:
val scanNode = allNodes.filter(ReadParser.isScanNode(_)).filter(node => {
// Get ReadSchema of each Node and sanitize it for comparison
val trimmedNode = AppBase.trimSchema(ReadParser.parseReadNode(node).schema)
readSchema.contains(trimmedNode)
})
amahussein
approved these changes
Dec 4, 2024
Collaborator
amahussein
left a comment
There was a problem hiding this comment.
Thanks @nartal1 !
I approve the scala side changes since I am not available in a couple of hours.
We should merge the PR once Lee approves the python side changes.
Thanks!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes #1325 . This is a follow-on PR to capture the expressions and save it to output file.
This PR supercedes #1431. Thanks @amahussein for resolving and fixing some of the issues in the previous PR.
In this PR, we print all the operators per app and per sqlID in a new file. This helps to get the count of operators in an application. It has count of both supported and unsupported operators.
Sample output:
Some of the changes in this PR:
a. It was using nodeName as an execNAme which causes the node to look like Scan JDBCRelation()[hfsdhfjhkhf -> after the fix it is Scan jdbc
b. If the readformat is unknown, we will put the node.desc to help us understand why we cannot extract the readformat
a. It was not setting correct OpType. It was OpType.Exec instead of OpType.ReadExec.
b. Applied the same naming logic in FileSourceScanExec.
a. It was setting expressions/unsupportedExpressions as the union of its children. Now those values are empty because they are part of the children.
b. set the execName to be WholeStageCodeGen or PhotonResultStage instead of WholeStageCodeGen ({nodeID})
c. The expression will be set to NodeName (nodeID)
This pull request includes several updates to improve the parsing and handling of execution nodes in the RAPIDS Accelerator for Apache Spark. The changes focus on refining the parsing logic, handling unsupported expressions, and enhancing the formatting and readability of the code.
Improvements to Execution Node Parsing:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/BatchScanExecParser.scala: Updated theBatchScanExecParserto use a more concise node name and improved the logic for setting execution expressions based on the read format. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/FileSourceScanExecParser.scala: Enhanced theFileSourceScanExecParserto handle node names more accurately and set execution expressions based on the read format, improving troubleshooting capabilities. [1] [2] [3]Handling Unsupported Expressions:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/ExecParser.scala: Modified theExecParsertrait to useUnsupportedExprOpRefinstead ofUnsupportedExprfor unsupported expression reasons. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala: Updated theGenericExecParserto utilizeUnsupportedExprOpRefand include expressions in thecreateExecInfomethod. [1] [2] [3] [4] [5]Code Formatting and Readability:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala: Refactored theExecInfocase class to useOpRefandUnsupportedExprOpRef, and added methods to improve readability and consistency. [1] [2] [3] [4] [5] [6]These changes collectively enhance the robustness and clarity of the code, making it easier to maintain and extend in the future.