-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling tool: Add support for driver log as input to generate unsupported operators report #654
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileArgs.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Profiler.scala
Outdated
Show resolved
Hide resolved
try { | ||
// Process each line in the file | ||
for (line <- source.getLines()) { | ||
if (line.contains("cannot run on GPU") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should protect against case sensitivity. this is also very sensitive to us changing the wording here. Can you check with Bobby on ideas to make this more resilient and to make sure this covers all the cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code was not written with the goal of parsing the output. "cannot run on GPU" is a clue that something went bad, but there is no guarantee that it will show up. Each expression has the ability to override how it shows up in the output. That was to allow attribute references and literals to not show up unless they are falling back to the CPU even in the case when you say to show ALL
. If it does not show up I would consider it a bug, but just grepping through for that phrase I see it in multiple places. Especially with deltalake, which makes me wonder if it could show up in other places, or if it is guaranteed to show up.
makes me think it will always show up, but then it is possible that it will not.
But "not all expressions can be replaced" is a free form message. And there can be multiple reasons why something is not on the GPU. We could have a project that is not on the GPU because a child expression cannot be replaces AND a datatype that we do not support AND.... That is not robust at all.
kind of explains the format, but it is not perfect. it does not say the explanation is a ;
separated list of explanations of why this didn't work, and we don't actually restrict ;
from showing up in an explanation, or in the operator?
section. I don't know how to make parsing this robust without changes to the format to make it less ambiguous, which if we are going to do that, then lets put it into the history file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change the format for the future if we are planning on having tools about it. Maybe something more official tags [NOT ON GPU] that is easily parseable and unique. Obviously will still have to keep old logic for anything up til that get implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue in spark-rapids - NVIDIA/spark-rapids#9728
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileArgs.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how large of an eventlog have you tested this with? I'm curious how long it takes in an app that takes a really long time.
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileMain.scala
Outdated
Show resolved
Hide resolved
I tested it with eventlog of size 734MB and driver log 88MB. It took about a minute to complete.
|
Co-authored-by: Thomas Graves <[email protected]>
Co-authored-by: Thomas Graves <[email protected]>
This fixes #580
This adds support for reading driver log as input and generating unsupported operators in a csv file.
New argument
driverlog
is added. Eventlog is not mandatory anymore. Either driverlog or eventlogs can be passed in command line. Tool throws an error if both driverlog and eventlogs is not provided.Output of driverlog is posted in a different directory -
rapids_4_spark_profile/driver/unsupported_operators_in_driver_log.csv
Sample csv output:
Added a unit test to read driver log file with driverlog argument.