Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Profiling logs against eventlogs on Databricks CPU cluster FAILED on --- NullPointerException: com.nvidia.spark.rapids.tool.profiling.CollectInformation.... #552

Closed
NvTimLiu opened this issue Sep 9, 2023 · 1 comment
Labels
bug Something isn't working core_tools Scope the core module (scala)

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Sep 9, 2023

Describe the bug

[BUG] Profiling logs against NDS eventlogs on Databricks CPU cluster FAILED as below

NOTE: Only FAILED on the CPU FULL logs, dbfs:/cicd/azure2-cpu/eventlog,dbfs:/cicd/aws-cpu/eventlog,
collected either on aws or azure Databricks CPU clusters

1, Profile PASS with part of CPU envetnlogs, e.g. dbfs:/cicd/cpu/eventlog-2023-09-07--11-00.gz

2, Profile PASS with GPU eventlogs, e.g. dbfs:/cicd/eventlog

3, event log available on azure DBFS:/ against the host: 763784504165494.14

export DATABRICKS_HOST=YOUR_HOST
export DATABRICKS_THOKEN=YOUR_TOKE
SPARK_HOME=export SPARK_HOME=${PWD}/spark-3.2.0-bin-hadoop3.2
RAPIDS_TOOLS_JAR=$PWD/rapids-4-spark-tools_2.12-23.08.1-SNAPSHOT.jar
CLASS=com.nvidia.spark.rapids.tool.profiling.ProfileMain

OUTPUT_DIR=$PWD/output/cpu
EVENT_LOGS=dbfs:/cicd/azure2-cpu/eventlog
java -Xmx20g -cp ${RAPIDS_TOOLS_JAR}:${SPARK_HOME}/jars/* ${CLASS} \
        --csv \
        --output-directory file://${OUTPUT_DIR} \
        ${EVENT_LOGS}


+ java -Xmx20g -cp 'rapids-4-spark-tools_2.12-23.08.1-SNAPSHOT.jar:/databricks/jars/*' com.nvidia.spark.rapids.tool.profiling.ProfileMain --csv --output-directory file:///tmp/cpu /dbfs/cicd/cpu/aws/eventlog
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/09/09 12:47:36 INFO Profiler: Threadpool size is 1
23/09/09 12:47:36 INFO ApplicationInfo: Parsing Event Log: file:/dbfs/cicd/cpu/aws/eventlog
23/09/09 12:47:37 WARN ToolUtils: ClassNotFoundException while parsing an event: DBCEventLoggingListenerMetadata
Profile Tool Progress 0% [>                                                       ] (0 succeeded + 0 failed + 0 N/A) / 1
23/09/09 12:47:38 WARN Utils: Your hostname,xxxxx resolves to a loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface eth0)
23/09/09 12:47:38 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
23/09/09 12:48:27 INFO ApplicationInfo: Total number of events parsed: 208205 for file:/dbfs/cicd/cpu/aws/eventlog
23/09/09 12:48:30 INFO EventLogPathProcessor: ==============   (index=1)  ==============
23/09/09 12:48:30 INFO Profiler: Took 54072ms to process file:/dbfs/cicd/cpu/aws/eventlog
23/09/09 12:48:30 WARN Profiler: Exception occurred processing file: eventlog
java.lang.NullPointerException
        at com.nvidia.spark.rapids.tool.profiling.CollectInformation.$anonfun$getAppInfo$1(CollectInformation.scala:39)
        at scala.collection.immutable.List.map(List.scala:293)
        at com.nvidia.spark.rapids.tool.profiling.CollectInformation.getAppInfo(CollectInformation.scala:37)
        at com.nvidia.spark.rapids.tool.profiling.Profiler.com$nvidia$spark$rapids$tool$profiling$Profiler$$processApps(Profiler.scala:315)
        at com.nvidia.spark.rapids.tool.profiling.Profiler$ProfileProcessThread$1.run(Profiler.scala:249)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
23/09/09 12:48:30 INFO ToolTextFileWriter: Profile summary output location: file:/tmp/cpu/rapids_4_spark_profile/profile.log
Profile Tool Progress 100% [======================================================] (0 succeeded + 1 failed + 0 N/A) / 1
Profile Tool execution time: 54122ms
        process.success.count = 0
        process.failure.count = 1
        process.NA.count = 0
        execution.total.count = 1

@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage labels Sep 9, 2023
@mattahrens mattahrens added core_tools Scope the core module (scala) and removed ? - Needs Triage labels Sep 12, 2023
@amahussein
Copy link
Collaborator

Closing this as duplicate to #639 which was fixed in #640

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

No branches or pull requests

3 participants