Skip to content

[SPARK-57472][SQL] Make FileTable.mergedOptions merge table and relation options case-insensitively#56520

Draft
matthewbayer wants to merge 1 commit into
apache:masterfrom
matthewbayer:mb/upstream-filetable-mergedoptions
Draft

[SPARK-57472][SQL] Make FileTable.mergedOptions merge table and relation options case-insensitively#56520
matthewbayer wants to merge 1 commit into
apache:masterfrom
matthewbayer:mb/upstream-filetable-mergedoptions

Conversation

@matthewbayer

Copy link
Copy Markdown

What changes were proposed in this pull request?

FileTable.mergedOptions merges a FileTable's own options with the options carried by the table operation (the relation), with the operation's options taking precedence. This PR makes that merge case-insensitive so the operation value deterministically wins on a key that differs only in case.

Previously the merge used a case-sensitive ++:

val finalOptions = this.options.asCaseSensitiveMap().asScala ++ options.asCaseSensitiveMap().asScala
new CaseInsensitiveStringMap(finalOptions.asJava)

If the table and the operation set the same option with different key casing (e.g. lineSep vs linesep), both entries survive the ++, and CaseInsensitiveStringMap's constructor then collapses them by HashMap iteration order — picking an arbitrary winner and silently dropping the other (logging "Converting duplicated key ... into CaseInsensitiveStringMap").

The fix drops any table option the operation already sets (case-insensitively, via CaseInsensitiveStringMap.containsKey) before merging:

val tableOnly = this.options.asCaseSensitiveMap().asScala
  .filter { case (key, _) => !options.containsKey(key) }
new CaseInsensitiveStringMap((tableOnly ++ options.asCaseSensitiveMap().asScala).asJava)

Why are the changes needed?

The documented "operation options take precedence" behavior (asserted by the existing FileTableSuite test added in SPARK-49519 / SPARK-50287) is not honored when the two option maps use different casing for the same key. The winner is determined by HashMap iteration order rather than precedence, which is non-deterministic and can silently drop the intended value.

Does this PR introduce any user-facing change?

No behavior is intended to change for correctly-cased options. For options that differ only in case between the table and the operation, the operation value now deterministically wins (previously the winner was arbitrary). This only affects unreleased master.

How was this patch tested?

Extended the existing SPARK-49519 / SPARK-50287 FileTableSuite test with a case-variant case (table lineSep vs operation linesep) across all file-based data sources, asserting the operation value wins for both read (newScanBuilder) and write (newWriteBuilder) and that the colliding table key does not survive as a separate entry.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

@matthewbayer matthewbayer changed the title [WIP][SPARK-57472][SQL] Make FileTable.mergedOptions merge table and relation options case-insensitively [SPARK-57472][SQL] Make FileTable.mergedOptions merge table and relation options case-insensitively Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants