-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: fix DataFrame.__setitem__ with 2D object arrays
#63184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
65df683
8a8c670
62f7c4b
aa707b1
c5c8953
78f8ce7
e2ad3fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5502,7 +5502,30 @@ def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]: | |
|
|
||
| if is_list_like(value): | ||
| com.require_length_match(value, self.index) | ||
| return sanitize_array(value, self.index, copy=True, allow_2d=True), None | ||
|
|
||
| # GH#61026: special-case 2D inputs for single-column assignment. | ||
| # - accept shape (n, 1) by flattening to 1D | ||
| # - disallow 2D *object* arrays with more than one column, since those | ||
| # correspond to a single column key and should be rejected | ||
| arr = value | ||
|
|
||
| # np.matrix is always 2D; gonna convert to regular ndarray | ||
| if isinstance(arr, np.matrix): | ||
| arr = np.asarray(arr) | ||
|
|
||
| if isinstance(arr, np.ndarray) and arr.ndim == 2: | ||
| if arr.shape[1] == 1: | ||
| # treating (n, 1) as a length-n 1D array | ||
| arr = arr[:, 0] | ||
| elif arr.dtype == object: | ||
| # single-column setitem with a 2D object array is not allowed. | ||
|
Comment on lines
+5520
to
+5521
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why only object dtype here?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The The problematic behaviour ( |
||
| msg = ( | ||
| "Setting a DataFrame column with a 2D array requires " | ||
| f"shape (n, 1); got shape {arr.shape}." | ||
| ) | ||
| raise ValueError(msg) | ||
| subarr = sanitize_array(arr, self.index, copy=True, allow_2d=True) | ||
| return subarr, None | ||
|
|
||
| @property | ||
| def _series(self): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case do we get a matrix here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_sanitize_column(...)can see annp.matrixwhen the user assigns one directly. for example:df["col"] = np.matrix([[1], [2], [3]]).Since,
np.matrixis always 2D and preserves its 2D shape under the slicing operation, callingarr[:, 0](which occurs on line5517) on a matrix still gives the shape(n, 1)rather than(n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.Hence, I thought converting matrics to a regular
ndarrayfirst will ensure that the upcoming blocks behave consistently for bothnp.ndarrayandnp.matrix.