[SPARK-56414][SQL] Per-write options should take precedence over session config in file source writes by cloud-fan · Pull Request #55280 · apache/spark

cloud-fan · 2026-04-09T13:50:57Z

What changes were proposed in this pull request?

In Parquet and Avro prepareWrite, several Hadoop configuration keys are unconditionally set from session-level SQLConf, silently overwriting any per-write options the user provided. Additionally, some write paths (FileStreamSink, InsertIntoHiveTable) create the Hadoop conf via newHadoopConf() without merging write options at all, so per-write options never reach the conf.

This PR fixes both issues:

FileFormatWriter.write: Merges write options into the Job's Hadoop conf before calling prepareWrite. This is the central fix that ensures per-write options are in the conf regardless of how the caller created it. Handles CaseInsensitiveMap key lowercasing by using the original map keys.
ParquetUtils.prepareWrite: Uses DataSourceUtils.setConfIfAbsent so that SQLConf defaults are only applied when the key is not already present in the conf (i.e., no per-write option was provided). Affected keys:
- spark.sql.parquet.writeLegacyFormat
- spark.sql.parquet.outputTimestampType
- spark.sql.parquet.fieldId.write.enabled
- spark.sql.legacy.parquet.nanosAsLong
- spark.sql.parquet.annotateVariantLogicalType
AvroUtils.prepareWrite: Same treatment for Avro compression settings:
- Zstandard buffer pool (avro.output.codec.zstd.bufferpool)
- Compression levels (avro.mapred.<codec>.level)

Why are the changes needed?

Per-write options (passed via DataFrameWriter.option() or DataStreamWriter.option()) should take precedence over session-level SQLConf defaults. This is already the case for compression codecs in both Parquet and Avro, but other write configuration keys had their per-write values silently overwritten. For example, setting spark.sql.parquet.outputTimestampType as a write option in a streaming sink had no effect because (a) FileStreamSink doesn't merge options into the Hadoop conf, and (b) prepareWrite always replaced the value with the session config.

Does this PR introduce any user-facing change?

Yes. Per-write options for the listed keys now take effect instead of being silently ignored. Previously, only the session-level SQLConf value was used regardless of what was passed as a write option.

How was this patch tested?

ParquetEncodingSuite: test verifying per-write outputTimestampType overrides session config for batch writes.
FileStreamSinkV1Suite: test verifying per-write outputTimestampType overrides session config for streaming writes (exercises the FileStreamSink path that uses newHadoopConf() without options).

Both tests fail on master and pass with this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

…on config in Parquet and Avro Co-authored-by: Isaac

Co-authored-by: Isaac

…ng test Co-authored-by: Isaac

[SPARK-xxxx][SQL] Per-write options should take precedence over sessi…

6538d47

…on config in Parquet and Avro Co-authored-by: Isaac

cloud-fan changed the title ~~[SPARK-xxxx][SQL] Per-write options should take precedence over session config in Parquet and Avro~~ [SPARK-56414][SQL] Per-write options should take precedence over session config in Parquet and Avro Apr 9, 2026

cloud-fan added 2 commits April 9, 2026 13:54

Remove redundant comment in ParquetUtils

4ef5b36

Co-authored-by: Isaac

Remove writerVersion test case that doesn't test the fix

11f9cf9

Co-authored-by: Isaac

cloud-fan mentioned this pull request Apr 9, 2026

[SPARK-56415][INFRA] Simplify create_spark_jira.py for LLM-driven JIRA ticket creation #55281

Open

Merge write options into Job conf in FileFormatWriter and add streami…

d00d621

…ng test Co-authored-by: Isaac

cloud-fan changed the title ~~[SPARK-56414][SQL] Per-write options should take precedence over session config in Parquet and Avro~~ [SPARK-56414][SQL] Per-write options should take precedence over session config in file source writes Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56414][SQL] Per-write options should take precedence over session config in file source writes#55280

[SPARK-56414][SQL] Per-write options should take precedence over session config in file source writes#55280
cloud-fan wants to merge 4 commits intoapache:masterfrom
cloud-fan:fix-parquet-write-option-priority

cloud-fan commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cloud-fan commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloud-fan commented Apr 9, 2026 •

edited

Loading