-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
(fix): pandas extension array repr for int64[pyarrow] #10317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Separately, I am not sure if |
Reverted the addition of |
Is it possible these three tests are flaky? It seems to pass locally for me and on the face of it, has nothing to do with this PR> |
@ilan-gold This surfaced with the release of dask 2025.5.0. Something has changed on the dask side and/or on the xarray side, which let the error surface. Here is the stacktrace on my machine: xarray/tests/test_backends.py:827 (TestDask.test_outer_indexing_reversed)
self = <xarray.tests.test_backends.TestDask object at 0x7f2da44c2ff0>
def test_outer_indexing_reversed(self) -> None:
# regression test for GH6560
ds = xr.Dataset(
{"z": (("t", "p", "y", "x"), np.ones((1, 1, 31, 40)))},
)
with self.roundtrip(ds) as on_disk:
> subset = on_disk.isel(t=[0], p=0).z[:, ::10, ::10][:, ::-1, :]
/home/kai/python/projects/xarray/xarray/tests/test_backends.py:835:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/kai/python/projects/xarray/xarray/core/dataset.py:2764: in isel
var = var.isel(var_indexers)
/home/kai/python/projects/xarray/xarray/core/variable.py:1032: in isel
return self[key]
/home/kai/python/projects/xarray/xarray/core/variable.py:782: in __getitem__
return self._finalize_indexing_result(dims, data)
/home/kai/python/projects/xarray/xarray/core/variable.py:786: in _finalize_indexing_result
return self._replace(dims=dims, data=data)
/home/kai/python/projects/xarray/xarray/core/variable.py:938: in _replace
return type(self)(dims, data, attrs, encoding, fastpath=True)
/home/kai/python/projects/xarray/xarray/core/variable.py:365: in __init__
super().__init__(
/home/kai/python/projects/xarray/xarray/namedarray/core.py:264: in __init__
self._dims = self._parse_dimensions(dims)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[AttributeError("'Variable' object has no attribute '_dims'") raised in repr()] Variable object at 0x7f2d5c6c6380>
dims = ('t', 'y', 'x')
def _parse_dimensions(self, dims: _DimsLike) -> _Dims:
dims = (dims,) if isinstance(dims, str) else tuple(dims)
if len(dims) != self.ndim:
> raise ValueError(
f"dimensions {dims} must have the same length as the "
f"number of data dimensions, ndim={self.ndim}"
)
E ValueError: dimensions ('t', 'y', 'x') must have the same length as the number of data dimensions, ndim=4
/home/kai/python/projects/xarray/xarray/namedarray/core.py:508: ValueError |
@kmuehlbauer Thanks for the heads up, sounds like I need to update my env then. Thanks for the info here! |
See new issue #10321. |
@@ -176,6 +177,8 @@ def format_timedelta(t, timedelta_format=None): | |||
|
|||
def format_item(x, timedelta_format=None, quote_strings=True): | |||
"""Returns a succinct summary of an object as a string""" | |||
if isinstance(x, PandasExtensionArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add __str__
to PandasExtensionArray instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you mean? The point here is to bypass PandasExtensionArray
here because its __repr__
is PandasExtensionArray(array=[...])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK right, can you add that statement as a comment please?
Similar to #10315, we uncover more unexpected behavior with
int64[pyarrow]
as the testing data type.We were apparently relying on a call to
np.array
(via thexp.reshape
) internally for the actual printing of the elements but this is not implemented forint64[pyarrow]
(unlike other extension array types) and therefore broke without an explicit implementation. So we add that implementationOnce this has been handled, the
PandasExtensionArray
is no longer being converted into a numpy array silently, and thus we need to ensure that the repr can handle thePandasExtensionArray
natively.Lastly we add a
KeyError
to the lookup to ensure that this sort of thing cannot happen again.From (fix): no
fill_value
on reindex #10304Tests added
User visible changes (including notable bug fixes) are documented in
whats-new.rst
New functions/methods are listed in
api.rst