Hello, is it possible to use Spark Declarative Pipelines and Unit Catalog together? I am using Spark 4.1.1 and Unity Catalog 0.4.0 on my laptop as a test. When I setup a session outside of SDP I complete the following and I can read and write to Unity Catalog including the use of managed tables....
from pyspark.sql import SparkSession
spark = (
SparkSession.builder.appName("Test")
.config("spark.jars.packages", "io.unitycatalog:unitycatalog-spark_2.13:0.4.0,io.delta:delta-spark_2.13:4.1.0")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.databricks.delta.catalog.update.enabled", "true")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.sql.catalog.landing", "io.unitycatalog.spark.UCSingleCatalog")
.config("spark.sql.catalog.landing.uri", "http://localhost:8080")
.config("spark.sql.catalog.landing.token", "")
.config("spark.sql.catalog.bronze", "io.unitycatalog.spark.UCSingleCatalog")
.config("spark.sql.catalog.bronze.uri", "http://localhost:8080")
.config("spark.sql.catalog.bronze.token", "")
.config("spark.sql.catalog.silver", "io.unitycatalog.spark.UCSingleCatalog")
.config("spark.sql.catalog.silver.uri", "http://localhost:8080")
.config("spark.sql.catalog.silver.token", "")
.config("spark.sql.catalog.gold", "io.unitycatalog.spark.UCSingleCatalog")
.config("spark.sql.catalog.gold.uri", "http://localhost:8080")
.config("spark.sql.catalog.gold.token", "")
.config("spark.sql.defaultCatalog", "bronze")
.getOrCreate()
)
When I try to create a SDP I setup spark-pipeline.yml in a similar way...
configuration:
# spark.jars.packages: io.unitycatalog:unitycatalog-spark_2.13:0.4.0,io.delta:delta-spark_2.13:4.1.0
# spark.sql.extensions: io.delta.sql.DeltaSparkSessionExtension
spark.databricks.delta.catalog.update.enabled: 'true'
spark.sql.catalog.spark_catalog: org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.sql.defaultCatalog: bronze
spark.sql.catalog.landing: io.unitycatalog.spark.UCSingleCatalog
spark.sql.catalog.landing.uri: http://localhost:8080
spark.sql.catalog.landing.token: ''
spark.sql.catalog.bronze: io.unitycatalog.spark.UCSingleCatalog
spark.sql.catalog.bronze.uri: http://localhost:8080
spark.sql.catalog.bronze.token: ''
spark.sql.catalog.silver: io.unitycatalog.spark.UCSingleCatalog
spark.sql.catalog.silver.uri: http://localhost:8080
spark.sql.catalog.silver.token: ''
spark.sql.catalog.gold: io.unitycatalog.spark.UCSingleCatalog
spark.sql.catalog.gold.uri: http://localhost:8080
spark.sql.catalog.gold.token: ''
I commented out the first two lines as you cannot override these properties.
If I create a simple pipeline like the following using some dummy data in a bronze table...
spark = SparkSession.getActiveSession()
@dp.materialized_view(name="silver.customer.customer", table_properties={"delta.feature.catalogManaged": "supported"})
def customer() -> DataFrame:
return spark.table("bronze.grand_sales_international.customer")
The dry run works fine but the real run gives the following...
2026-03-31 16:16:44: Creating dataflow graph...
2026-03-31 16:16:45: Registering graph elements...
2026-03-31 16:16:45: Loading definitions. Root directory: '/home/graemecash/projects/oos_lakehouse/src/pipelines/customer'.
2026-03-31 16:16:45: Found 1 files matching glob 'transformations/python/**/*'
2026-03-31 16:16:45: Importing /home/graemecash/projects/oos_lakehouse/src/pipelines/customer/transformations/python/customer.py...
2026-03-31 16:16:45: Starting run...
Traceback (most recent call last):
File "/home/graemecash/spark/python/pyspark/pipelines/cli.py", line 447, in <module>
main()
File "/home/graemecash/spark/python/pyspark/pipelines/cli.py", line 426, in main
run(
File "/home/graemecash/spark/python/pyspark/pipelines/cli.py", line 350, in run
handle_pipeline_events(result_iter)
File "/home/graemecash/spark/python/lib/pyspark.zip/pyspark/pipelines/spark_connect_pipeline.py", line 53, in handle_pipeline_events
for result in iter:
File "/home/graemecash/spark/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1221, in execute_command_as_iterator
File "/home/graemecash/spark/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1674, in _execute_and_fetch_as_iterator
File "/home/graemecash/spark/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 1982, in _handle_error
File "/home/graemecash/spark/python/lib/pyspark.zip/pyspark/sql/connect/client/core.py", line 2066, in _handle_rpc_error
pyspark.errors.exceptions.connect.UnknownException: (org.apache.spark.sql.pipelines.graph.DatasetManager$TableMaterializationException) java.lang.AssertionError: assertion failed
Thanks
Hello, is it possible to use Spark Declarative Pipelines and Unit Catalog together? I am using Spark 4.1.1 and Unity Catalog 0.4.0 on my laptop as a test. When I setup a session outside of SDP I complete the following and I can read and write to Unity Catalog including the use of managed tables....
When I try to create a SDP I setup spark-pipeline.yml in a similar way...
I commented out the first two lines as you cannot override these properties.
If I create a simple pipeline like the following using some dummy data in a bronze table...
The dry run works fine but the real run gives the following...
Thanks