According to some articles I found:
Seems Spark can only parallelize the reads across row groups. Is this a known limitation?
Is there any way to split it by row or page level?
If a file has a single row group, it means all the tasks, except 1, would be idle? And that one task would read the entire file?
Thanks.
According to some articles I found:
Seems Spark can only parallelize the reads across row groups. Is this a known limitation?
Is there any way to split it by row or page level?
If a file has a single row group, it means all the tasks, except 1, would be idle? And that one task would read the entire file?
Thanks.