You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The job (s3_fsck_p0) script has been running at customer site for about 48 hours. We have had 3 failures in that time with error messages like the ones below.
21/11/09 14:32:54 INFO ContextCleaner: Cleaned accumulator 74
21/11/10 14:02:45 WARN TaskSetManager: Lost task 6.0 in stage 5.0 (TID 280, 10.60.190.21, executor 15): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/spark/python/lib/pyspark.zip/pyspark/worker.py", line 377, in main
process()
File "/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 393, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper
return f(*args, **kwargs)
File "/root/spark/scripts/./S3_FSCK/s3_fsck_p0.py", line 164, in <lambda>
File "/root/spark/scripts/./S3_FSCK/s3_fsck_p0.py", line 131, in blob
File "/root/spark/scripts/./S3_FSCK/s3_fsck_p0.py", line 121, in check_split
AttributeError: 'NoneType' object has no attribute 'zfill'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:244)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
... 10 more
21/11/10 14:02:45 INFO TaskSetManager: Starting task 6.1 in stage 5.0 (TID 284, 10.60.190.21, executor 12, partition 6, PROCESS_LOCAL, 7771 bytes)
21/11/10 14:02:45 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 10.60.190.21:32857 (size: 63.6 KB, free: 4.9 GB)
21/11/10 14:02:45 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.60.190.21:45752
![spark_error](https://user-images.githubusercontent.com/5631642/141475657-d71f106f-be0f-45ce-8966-f95fe189e441.png)
The text was updated successfully, but these errors were encountered:
This issue appears to have been tracked down to the versioned objects causing an issue.
Initially it seemed the binary/escape sequences delimiting the version from the object was the cause of the failure. However after cleaning out the non printable (binary etc.) content the error still happens when the p0 script is run.
Later we tried wiping out the entirety of the binary, version and whitespace: cat -v keys2.txt | sed 's/\^@.*RG001 .*\;/;/' > clean_keys2.txt
This resolved the NoneType errors. I suspect that the extra whitespace may be an issue since we were not using a quoted csv format.
The job (s3_fsck_p0) script has been running at customer site for about 48 hours. We have had 3 failures in that time with error messages like the ones below.
The text was updated successfully, but these errors were encountered: