-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report all operators in the output file #1444
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
Signed-off-by: Niranjan Artal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/ops/OpRef.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the fix the logic in looping on the graph nodes to build DSV1 that we discussed offline?
The filters in the code below should be swapped.
spark-rapids-tools/core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala
Lines 381 to 385 in 091af08
val scanNode = allNodes.filter(node => { | |
// Get ReadSchema of each Node and sanitize it for comparison | |
val trimmedNode = AppBase.trimSchema(ReadParser.parseReadNode(node).schema) | |
readSchema.contains(trimmedNode) | |
}).filter(ReadParser.isScanNode(_)) |
It should become:
val scanNode = allNodes.filter(ReadParser.isScanNode(_)).filter(node => {
// Get ReadSchema of each Node and sanitize it for comparison
val trimmedNode = AppBase.trimSchema(ReadParser.parseReadNode(node).schema)
readSchema.contains(trimmedNode)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1 !
I approve the scala side changes since I am not available in a couple of hours.
We should merge the PR once Lee approves the python side changes.
Thanks!
This fixes #1325 . This is a follow-on PR to capture the expressions and save it to output file.
This PR supercedes #1431. Thanks @amahussein for resolving and fixing some of the issues in the previous PR.
In this PR, we print all the operators per app and per sqlID in a new file. This helps to get the count of operators in an application. It has count of both supported and unsupported operators.
Sample output:
Some of the changes in this PR:
a. It was using nodeName as an execNAme which causes the node to look like Scan JDBCRelation()[hfsdhfjhkhf -> after the fix it is Scan jdbc
b. If the readformat is unknown, we will put the node.desc to help us understand why we cannot extract the readformat
a. It was not setting correct OpType. It was OpType.Exec instead of OpType.ReadExec.
b. Applied the same naming logic in FileSourceScanExec.
a. It was setting expressions/unsupportedExpressions as the union of its children. Now those values are empty because they are part of the children.
b. set the execName to be WholeStageCodeGen or PhotonResultStage instead of WholeStageCodeGen ({nodeID})
c. The expression will be set to NodeName (nodeID)
This pull request includes several updates to improve the parsing and handling of execution nodes in the RAPIDS Accelerator for Apache Spark. The changes focus on refining the parsing logic, handling unsupported expressions, and enhancing the formatting and readability of the code.
Improvements to Execution Node Parsing:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/BatchScanExecParser.scala
: Updated theBatchScanExecParser
to use a more concise node name and improved the logic for setting execution expressions based on the read format. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/FileSourceScanExecParser.scala
: Enhanced theFileSourceScanExecParser
to handle node names more accurately and set execution expressions based on the read format, improving troubleshooting capabilities. [1] [2] [3]Handling Unsupported Expressions:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/ExecParser.scala
: Modified theExecParser
trait to useUnsupportedExprOpRef
instead ofUnsupportedExpr
for unsupported expression reasons. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
: Updated theGenericExecParser
to utilizeUnsupportedExprOpRef
and include expressions in thecreateExecInfo
method. [1] [2] [3] [4] [5]Code Formatting and Readability:
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala
: Refactored theExecInfo
case class to useOpRef
andUnsupportedExprOpRef
, and added methods to improve readability and consistency. [1] [2] [3] [4] [5] [6]These changes collectively enhance the robustness and clarity of the code, making it easier to maintain and extend in the future.