-
Notifications
You must be signed in to change notification settings - Fork 219
Open
Description
My code is working fine with Spark 3.5.2 and 0.43.1 connector. I'm trying to upgrade to Spark 4.0 and I'm hitting this error:
WARN org.apache.spark.scheduler.TaskSetManager Lost task 0.0 in stage 0.0 (TID 0) (executor driver): java.lang.IllegalArgumentException: should have as many children as in the schema: found 0 expected 15
at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.util.Preconditions.checkArgument(Preconditions.java:255)
at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.VectorLoader.loadBuffers(VectorLoader.java:153)
at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:88)
at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:213)
at com.google.cloud.spark.bigquery.repackaged.org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:161)
at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext$SimpleAdapter.loadNextBatch(ArrowColumnBatchPartitionReaderContext.java:77)
at com.google.cloud.spark.bigquery.v2.context.ArrowColumnBatchPartitionReaderContext.next(ArrowColumnBatchPartitionReaderContext.java:252)
at com.google.cloud.spark.bigquery.v2.BigQueryPartitionReader.next(BigQueryPartitionReader.java:32)
I've narrowed it down to this statement:
.withColumn("row_num", row_number().over(
Window
.partitionBy(concat(col("user_pseudo_id"), col("event_timestamp"), col("event_name")))
.orderBy(lit("window_ordering"))
))
Without it, the code runs fine.
Metadata
Metadata
Assignees
Labels
No labels