-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dumping write data to try and reproduce error cases #11864
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for handling double-runs better but otherwise lgtm.
f"-c$fileCounter%03d" + ".debug" | ||
} else { | ||
base + "/" + partDir.mkString("/") + s"/DEBUG_" + | ||
taskAttemptContext.getTaskAttemptID.toString + f"-c$fileCounter%03d" + ".debug" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to leverage the application ID to help make this more unique? Otherwise if someone runs this twice in a row without updating the base path, we're going to get errors due to files already existing. The reader dump path uses random IDs and retries to make this possible, so it would be nice if it was also supported here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, but I didn't know how to get the application id, and I thought you had issues getting it consistently too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put in a timestamp to disambiguate
build |
I am getting an error in databricks for.
I'll try to understand why the HiveFileFormat is no longer serializable |
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
build |
@jlowe please take another look. I had a test failure related to the logging changes that happened in CUDF recently. I upmerged so it should be fixed now. |
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
Sorry @jlowe a merge conflict so I need your approval yet again. |
build |
CI Timed out on one job. This is important for debugging so I am just going to merge it. |
This fixes #11853
This does not have any C++ code to be able to interpret the jcudf serialization format to make reproducing things simpler offline. But I will begin working on some example code in spark-rapids-jni to help with this.
I also have not added in any documentation yet. I am not 100% sure what we want to do here, but we can try it out and see.