Skip to content

Conversation

@akkik04
Copy link

@akkik04 akkik04 commented Nov 24, 2025

Fixed DataFrame.__setitem__ so that assigning a 2D NumPy array with dtype=object and shape (n, 1) to a single column works the same way as the non-object case, and raise clearer, high-level errors for unsupported shapes. More detail below:

Before this change:

  • Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single DataFrame column (e.g., df["c1"] = t2) raised a low-level ValueError: Buffer has wrong number of dimensions (expected 1, got 2). This was coming from lib.maybe_convert_objects, instead of behaving like the non-object case.
  • 2D non-object arrays with shape (n, 1) already worked just fine, and assigning a 2D array with multiple columns to multiple columns (e.g., df[["c1", "c2"]] = t3) also worked, but ndim > 2 arrays could surface confusing internal errors.

After this change:

  • Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single column now works by flattening (n, 1) to a 1D (n,) array, matching the behaviour of non-object arrays.
  • Assigning a 2D array with more than one column to a single column raises a clear, user-facing ValueError explaining that only (n, 1) is supported and suggesting multi-column assignment (e.g., df[["c1", "c2"]] = some_values) for wider arrays.
  • Assigning arrays with ndim >= 3 to a single column is now raises an explicit ValueError indicating that setting a column with that spec is not supported. The existing multi-column assignment with 2D arrays remains unchanged.

@akkik04
Copy link
Author

akkik04 commented Dec 2, 2025

can I get some eyes on this when you get a chance @rhshadrach 🙌

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

arr = value

# np.matrix is always 2D; gonna convert to regular ndarray
if isinstance(arr, np.matrix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we get a matrix here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_column(...) can see an np.matrix when the user assigns one directly. for example: df["col"] = np.matrix([[1], [2], [3]]).

Since, np.matrix is always 2D and preserves its 2D shape under the slicing operation, calling arr[:, 0] (which occurs on line 5517) on a matrix still gives the shape (n, 1) rather than (n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.

Hence, I thought converting matrics to a regular ndarray first will ensure that the upcoming blocks behave consistently for both np.ndarray and np.matrix.

Comment on lines +5520 to +5521
elif arr.dtype == object:
# single-column setitem with a 2D object array is not allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only object dtype here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype == object guard is there to keep this bugfix scoped tightly to the case that regressed in issue #61026.

The problematic behaviour (ValueError: Buffer has wrong number of dimensions (expected 1, got 2)) only arose when assigning a 2D dtype=object array to a single column. For other dtypes, assigning a 2D array either already behaves correctly or raises a clearer, existing error, so this change leaves those paths alone to avoid altering semantics outside this issue.

@akkik04 akkik04 requested a review from rhshadrach December 11, 2025 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: setting column with 2D object array raises

2 participants