Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve profiling tool error message for corrupted work_info file #608

Closed
cindyyuanjiang opened this issue Oct 4, 2023 · 0 comments · Fixed by #623
Closed

[FEA] Improve profiling tool error message for corrupted work_info file #608

cindyyuanjiang opened this issue Oct 4, 2023 · 0 comments · Fixed by #623
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request

Comments

@cindyyuanjiang
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
When running the user profiling tool with --worker_info argument for autotuner results, if the file is missing softwareProperties like:

system:
  numCores: 32
  memory: 212992MiB
  numWorkers: 5
gpu:
  memory: 15109MiB
  count: 4
  name: T4
softwareProperties:

the profiler throws a null pointer exception:

23/10/04 15:02:50 ERROR Profiler: Exception thrown while writing
	| java.lang.NullPointerException
	| 	at com.nvidia.spark.rapids.tool.profiling.AutoTuner.getPropertyValue(AutoTuner.scala:360)
	| 	at com.nvidia.spark.rapids.tool.profiling.AutoTuner.$anonfun$initRecommendations$1(AutoTuner.scala:368)
	| 	at com.nvidia.spark.rapids.tool.profiling.AutoTuner.$anonfun$initRecommendations$1$adapted(AutoTuner.scala:366)
	| 	at scala.collection.immutable.List.foreach(List.scala:431)
	| 	at com.nvidia.spark.rapids.tool.profiling.AutoTuner.initRecommendations(AutoTuner.scala:366)
	| 	at com.nvidia.spark.rapids.tool.profiling.AutoTuner.getRecommendedProperties(AutoTuner.scala:909)
	| 	at com.nvidia.spark.rapids.tool.profiling.Profiler.$anonfun$writeOutput$50(Profiler.scala:506)
	| 	at com.nvidia.spark.rapids.tool.profiling.Profiler.$anonfun$writeOutput$50$adapted(Profiler.scala:453)
	| 	at scala.collection.immutable.List.foreach(List.scala:431)
	| 	at com.nvidia.spark.rapids.tool.profiling.Profiler.writeOutput(Profiler.scala:453)
	| 	at com.nvidia.spark.rapids.tool.profiling.Profiler.com$nvidia$spark$rapids$tool$profiling$Profiler$$writeSafelyToOutput(Profiler.scala:522)
	| 	at com.nvidia.spark.rapids.tool.profiling.Profiler$ProfileProcessThread$1.run(Profiler.scala:251)
	| 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	| 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	| 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	| 	at java.base/java.util.concurrent.ThreadPoolExecutor$W

This error message is confusing.

Describe the solution you'd like
The error message can mention "corrupted worker_info file, missing softwareProperties", so the users can debug this issue better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
2 participants