Skip to content

The spark driver has stopped unexpectedly and is restarting AFTER using excel lib #689

Discussion options

You must be logged in to vote

The maxRowsInMemory uses a streaming reader.
The v1 version (the one you're using if you do a .format("com.crealytics.spark.excel")) actually reads all rows into memory on the driver and only then calls parallelize to distribute the data to workers.
The v2 version (.format("excel")) reads directly on the workers.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@marcoaafernandes
Comment options

Answer selected by marcoaafernandes
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants