Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1150,6 +1150,7 @@ Indexing
- Bug in :meth:`Series.__setitem__` when assigning boolean series with boolean indexer will raise ``LossySetitemError`` (:issue:`57338`)
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
- Bug in :meth:`DataFrame.__setitem__` throwing a ``ValueError`` when setting a column with a 2D object array (:issue:`61026`)
- Bug in :meth:`DataFrame.loc.__getitem__` and :meth:`DataFrame.iloc.__getitem__` with a :class:`CategoricalDtype` column with integer categories raising when trying to index a row containing a ``NaN`` entry (:issue:`58954`)
- Bug in :meth:`Index.__getitem__` incorrectly raising with a 0-dim ``np.ndarray`` key (:issue:`55601`)
- Bug in :meth:`Index.get_indexer` not casting missing values correctly for new string datatype (:issue:`55833`)
Expand Down
25 changes: 24 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5502,7 +5502,30 @@ def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:

if is_list_like(value):
com.require_length_match(value, self.index)
return sanitize_array(value, self.index, copy=True, allow_2d=True), None

# GH#61026: special-case 2D inputs for single-column assignment.
# - accept shape (n, 1) by flattening to 1D
# - disallow 2D *object* arrays with more than one column, since those
# correspond to a single column key and should be rejected
arr = value

# np.matrix is always 2D; gonna convert to regular ndarray
if isinstance(arr, np.matrix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we get a matrix here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_column(...) can see an np.matrix when the user assigns one directly. for example: df["col"] = np.matrix([[1], [2], [3]]).

Since, np.matrix is always 2D and preserves its 2D shape under the slicing operation, calling arr[:, 0] (which occurs on line 5517) on a matrix still gives the shape (n, 1) rather than (n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.

Hence, I thought converting matrics to a regular ndarray first will ensure that the upcoming blocks behave consistently for both np.ndarray and np.matrix.

arr = np.asarray(arr)

if isinstance(arr, np.ndarray) and arr.ndim == 2:
if arr.shape[1] == 1:
# treating (n, 1) as a length-n 1D array
arr = arr[:, 0]
elif arr.dtype == object:
# single-column setitem with a 2D object array is not allowed.
Comment on lines +5520 to +5521
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only object dtype here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype == object guard is there to keep this bugfix scoped tightly to the case that regressed in issue #61026.

The problematic behaviour (ValueError: Buffer has wrong number of dimensions (expected 1, got 2)) only arose when assigning a 2D dtype=object array to a single column. For other dtypes, assigning a 2D array either already behaves correctly or raises a clearer, existing error, so this change leaves those paths alone to avoid altering semantics outside this issue.

msg = (
"Setting a DataFrame column with a 2D array requires "
f"shape (n, 1); got shape {arr.shape}."
)
raise ValueError(msg)
subarr = sanitize_array(arr, self.index, copy=True, allow_2d=True)
return subarr, None

@property
def _series(self):
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,6 +816,24 @@ def test_setitem_index_object_dtype_not_inferring(self):
)
tm.assert_frame_equal(df, expected)

def test_setitem_2d_object_array(self):
# GH#61026
df = DataFrame(
{
"c1": [1, 2, 3, 4, 5],
}
)

arr = np.array([["A"], ["B"], ["C"], ["D"], ["E"]], dtype=object)
df["c1"] = arr

expected = DataFrame(
{
"c1": ["A", "B", "C", "D", "E"],
}
)
tm.assert_frame_equal(df, expected)


class TestSetitemTZAwareValues:
@pytest.fixture
Expand Down
Loading