-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark structured streaming schema evolution #319
Comments
Hey @richiesgr, the code I used in my examples is really similar to ABRiS, I think it might have some enhancements regarding Enum types but overall it should be very similar. I also have no idea how to develop a workaround but it should be around restarting/replanning the streaming job. |
Hi @richiesgr What I meant in #176 with making Spark change its execution plan even during a long-running Structured Streaming query, is based on the realization that in micro batch mode, Spark actually creates a series of query executions which all get a new instance of the execution plan, see here: https://github.com/apache/spark/blob/v3.3.2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L647 So basically, you'd have to implement your own fork of Spark to support in-flight schema evolution without stopping and restarting the Spark query |
Hi @kevinwallimann
First I'm aware of the issue
I used abris to stream avro to delta table. As mentioned in the issue the schema evolution is not working neither using streaming or foreachbatch.
However using this code I'm able to update the schema for each message at least in the source the problem is that is never reflected on the sink (can be easily adapted to use abris by the way)
You said spark can't handle this out of the box I confirm but do you've any idea how to implement it ?
Thanks
The text was updated successfully, but these errors were encountered: