-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix: select_columns and drop_columns for qualified duplicated field names
#18236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
DataFrame::drop_columns for qualified duplicated field namesselect_columns and drop_columns for qualified duplicated field names
| " | ||
| ); | ||
|
|
||
| let select_res = join_res.select_columns(&["c1"])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line returned SchemaError without the code changes.
Error: SchemaError(AmbiguousReference { field: Column { relation: None, name: "c1" } }, Some(""))
| assert_snapshot!( | ||
| batches_to_sort_string(&drop_res.clone().collect().await.unwrap()), | ||
| @r" | ||
| +----+ | ||
| | c1 | | ||
| +----+ | ||
| | b | | ||
| | c | | ||
| | d | | ||
| +----+ | ||
| " | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This failed without the code changes
running 1 test
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot: drop_columns_duplicated_names_from_different_qualifiers-2
Source: datafusion/core/tests/dataframe/mod.rs:627
────────────────────────────────────────────────────────────────────────────────
Expression: batches_to_sort_string(&drop_res.clone().collect().await.unwrap())
────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬───────────────────────────────────────────────────────────────────
1 │-+----+
2 │-| c1 |
3 │-+----+
4 │-| b |
5 │-| c |
6 │-| d |
7 │-+----+
1 │++----+-------+-------+
2 │+| c1 | mark | mark |
3 │++----+-------+-------+
4 │+| b | true | false |
5 │+| c | false | false |
6 │+| d | false | true |
7 │++----+-------+-------+
────────────┴───────────────────────────────────────────────────────────────────
To update snapshots run `cargo insta review`
Stopped on the first failure. Run `cargo insta test` to run all snapshots.
test dataframe::drop_columns_duplicated_names_from_different_qualifiers ... FAILED
failures:
failures:
dataframe::drop_columns_duplicated_names_from_different_qualifiers
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 769 filtered out; finished in 0.17s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking this @dqkqd
Which issue does this PR close?
JoinType::LeftMarkif there are multiple joins #18212.Rationale for this change
DataFrame::drop_columnsonly considers one field for eachname,it fails to drop columns from dataframe containing duplicated names
from different relations. Such as
markcolumns created by multiplesJoin::LeftMark.DataFrame::select_columnshas the same issue, it fails to select columnswith the same name from different relations.
What changes are included in this PR?
Allow
DataFrame::drop_columnsandDataFrame::select_columnswork with duplicated names from different relations.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.