Skip to content

Conversation

@dqkqd
Copy link
Contributor

@dqkqd dqkqd commented Oct 22, 2025

Which issue does this PR close?

Rationale for this change

DataFrame::drop_columns only considers one field for each name,
it fails to drop columns from dataframe containing duplicated names
from different relations. Such as mark columns created by multiples
Join::LeftMark.

DataFrame::select_columns has the same issue, it fails to select columns
with the same name from different relations.

What changes are included in this PR?

Allow DataFrame::drop_columns and DataFrame::select_columns
work with duplicated names from different relations.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the core Core DataFusion crate label Oct 22, 2025
@dqkqd dqkqd changed the title fix: DataFrame::drop_columns for qualified duplicated field names fix: select_columns and drop_columns for qualified duplicated field names Oct 23, 2025
"
);

let select_res = join_res.select_columns(&["c1"])?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line returned SchemaError without the code changes.

    Error: SchemaError(AmbiguousReference { field: Column { relation: None, name: "c1" } }, Some(""))

Comment on lines +627 to +641
assert_snapshot!(
batches_to_sort_string(&drop_res.clone().collect().await.unwrap()),
@r"
+----+
| c1 |
+----+
| b |
| c |
| d |
+----+
"
);

Ok(())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failed without the code changes

    running 1 test
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Snapshot: drop_columns_duplicated_names_from_different_qualifiers-2
    Source: datafusion/core/tests/dataframe/mod.rs:627
    ────────────────────────────────────────────────────────────────────────────────
    Expression: batches_to_sort_string(&drop_res.clone().collect().await.unwrap())
    ────────────────────────────────────────────────────────────────────────────────
    -old snapshot
    +new results
    ────────────┬───────────────────────────────────────────────────────────────────
        1       │-+----+
        2       │-| c1 |
        3       │-+----+
        4       │-| b  |
        5       │-| c  |
        6       │-| d  |
        7       │-+----+
              1 │++----+-------+-------+
              2 │+| c1 | mark  | mark  |
              3 │++----+-------+-------+
              4 │+| b  | true  | false |
              5 │+| c  | false | false |
              6 │+| d  | false | true  |
              7 │++----+-------+-------+
    ────────────┴───────────────────────────────────────────────────────────────────
    To update snapshots run `cargo insta review`
    Stopped on the first failure. Run `cargo insta test` to run all snapshots.
    test dataframe::drop_columns_duplicated_names_from_different_qualifiers ... FAILED

    failures:

    failures:
        dataframe::drop_columns_duplicated_names_from_different_qualifiers

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 769 filtered out; finished in 0.17s

@dqkqd dqkqd marked this pull request as ready for review October 23, 2025 20:49
Copy link

@albertlockett albertlockett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for taking this @dqkqd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to drop the mark column added by JoinType::LeftMark if there are multiple joins

2 participants