-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool: Enhance mapping of Execs to stages #634
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
In description what does this do if stages ids wrong it are different? ie. stage 4 -> no stage id -> stage 5. Would be good to clarify in the description |
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
core/src/test/resources/QualificationExpectations/complex_dec_expectation.csv
Outdated
Show resolved
Hide resolved
Signed-off-by: Niranjan Artal <[email protected]>
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor documentation otherwise looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1
This fixes #615 .
This PR improves mapping of Execs to stageID for each stage.
Currently, Execs are mapped to stages based on metrics generated by an Exec. i.e if an Exec has metrics such as "number of output rows" which will have an accumulatorID associated with it. If any nodes didn't have metrics generated then we would put those Execs in a separate bucket - execsNoStage.
This would cause to miss some of the Execs to stages mapping. This PR improves mapping by looking at the neighbors of the Exec i.e If an adjacent Exec of a given Exec has a stageId associated with it, then it's most likely that current Exec is also in the same stage and hence we can assign the same stageId.
If there is an Exec without stageId but it's adjacent Exec's have stageId's associated with them, then the current Exec will be tagged with it's previous(left) stageId.
Example: Exec1(stageId:10) -> Exec2(no StageId) -> Exec3(stageId:20)
Exec2 is associated with stageId:10 in above case.
This change improves the qualification scores of some of the customer eventlogs. Updated the current unit test results.