Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Redundant fields required when using merge_builder in DeltaTableWriter #169

Open
zarembat opened this issue Feb 21, 2025 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@zarembat
Copy link
Contributor

zarembat commented Feb 21, 2025

When merging data into a Delta table using DeltaTableWriter with the merge_builder param one needs to provide the source DataFrame and the target table name as arguments to DeltaTableWriter, even though they are already passed to the merge builder. Otherwise Koheesio throws an error about missing required fields. So effectively we are passing the same values twice.

Describe the bug

Take this example of merging data into a Delta table using merge_builder:

spark = get_active_session()
merge_condition = (
    "target.COLUMN_A = source.COLUMN_A AND target.COLUMN_B = source.COLUMN_B"
)

DeltaTableWriter(
    df=source_df,
    table=target_table_name,
    output_mode=BatchOutputMode.MERGE,
    output_mode_params={
        "merge_builder": (
            DeltaTable.forName(sparkSession=spark, tableOrViewName=target_table_name)
            .alias("target")
            .merge(source=source_df.alias("source"), condition=merge_condition)
            .whenNotMatchedInsertAll()
        )
    },
).write()

Even though in the merge_builder param we already pass source_df and target_table_name, we still need to pass the same values as the df and table arguments respectively to DeltaTableWriter. Otherwise Pydantic will throw a Validation Error.

Steps to Reproduce

Use the above code snippet to merge data into a Delta table using merge builder to trigger the error.

Expected behavior

When using a merge builder with appropriate params, one should not be forced to additionally pass the df and table params to DeltaTableWriter.

Environment

Koheesio 0.9.0

@zarembat zarembat added the bug Something isn't working label Feb 21, 2025
@dannymeijer dannymeijer added this to the 0.11 milestone Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants