Skip to content

(fix): pandas extension array repr for int64[pyarrow] #10317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented May 13, 2025

Similar to #10315, we uncover more unexpected behavior with int64[pyarrow] as the testing data type.

  • We were apparently relying on a call to np.array (via the xp.reshape) internally for the actual printing of the elements but this is not implemented for int64[pyarrow] (unlike other extension array types) and therefore broke without an explicit implementation. So we add that implementation

  • Once this has been handled, the PandasExtensionArray is no longer being converted into a numpy array silently, and thus we need to ensure that the repr can handle the PandasExtensionArray natively.

  • Lastly we add a KeyError to the lookup to ensure that this sort of thing cannot happen again.

  • From (fix): no fill_value on reindex #10304

  • Tests added

  • User visible changes (including notable bug fixes) are documented in whats-new.rst

  • New functions/methods are listed in api.rst

@ilan-gold
Copy link
Contributor Author

Separately, I am not sure if max_width was meant to be inclusive or exclusive. The line length for the IntervalArray is exactly 100.

@ilan-gold
Copy link
Contributor Author

Reverted the addition of IntervalArray a23de74 - see failure at https://github.com/pydata/xarray/actions/runs/15000671362/job/42146523169 for test_repr where the line length is exactly 100 instead of less than 100. I'm not clear what the behavior should be but let's keep this as minimal as possible. I will follow up with a more general "more test cases" PR.

@ilan-gold
Copy link
Contributor Author

Is it possible these three tests are flaky? It seems to pass locally for me and on the face of it, has nothing to do with this PR>

@kmuehlbauer
Copy link
Contributor

Is it possible these three tests are flaky? It seems to pass locally for me and on the face of it, has nothing to do with this PR>

@ilan-gold This surfaced with the release of dask 2025.5.0. Something has changed on the dask side and/or on the xarray side, which let the error surface.

Here is the stacktrace on my machine:

xarray/tests/test_backends.py:827 (TestDask.test_outer_indexing_reversed)
self = <xarray.tests.test_backends.TestDask object at 0x7f2da44c2ff0>

    def test_outer_indexing_reversed(self) -> None:
        # regression test for GH6560
        ds = xr.Dataset(
            {"z": (("t", "p", "y", "x"), np.ones((1, 1, 31, 40)))},
        )
    
        with self.roundtrip(ds) as on_disk:
>           subset = on_disk.isel(t=[0], p=0).z[:, ::10, ::10][:, ::-1, :]

/home/kai/python/projects/xarray/xarray/tests/test_backends.py:835: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/kai/python/projects/xarray/xarray/core/dataset.py:2764: in isel
    var = var.isel(var_indexers)
/home/kai/python/projects/xarray/xarray/core/variable.py:1032: in isel
    return self[key]
/home/kai/python/projects/xarray/xarray/core/variable.py:782: in __getitem__
    return self._finalize_indexing_result(dims, data)
/home/kai/python/projects/xarray/xarray/core/variable.py:786: in _finalize_indexing_result
    return self._replace(dims=dims, data=data)
/home/kai/python/projects/xarray/xarray/core/variable.py:938: in _replace
    return type(self)(dims, data, attrs, encoding, fastpath=True)
/home/kai/python/projects/xarray/xarray/core/variable.py:365: in __init__
    super().__init__(
/home/kai/python/projects/xarray/xarray/namedarray/core.py:264: in __init__
    self._dims = self._parse_dimensions(dims)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("'Variable' object has no attribute '_dims'") raised in repr()] Variable object at 0x7f2d5c6c6380>
dims = ('t', 'y', 'x')

    def _parse_dimensions(self, dims: _DimsLike) -> _Dims:
        dims = (dims,) if isinstance(dims, str) else tuple(dims)
        if len(dims) != self.ndim:
>           raise ValueError(
                f"dimensions {dims} must have the same length as the "
                f"number of data dimensions, ndim={self.ndim}"
            )
E           ValueError: dimensions ('t', 'y', 'x') must have the same length as the number of data dimensions, ndim=4

/home/kai/python/projects/xarray/xarray/namedarray/core.py:508: ValueError

@ilan-gold
Copy link
Contributor Author

@kmuehlbauer Thanks for the heads up, sounds like I need to update my env then. Thanks for the info here!

@kmuehlbauer
Copy link
Contributor

@kmuehlbauer Thanks for the heads up, sounds like I need to update my env then. Thanks for the info here!

See new issue #10321.

@ilan-gold ilan-gold requested a review from dcherian May 22, 2025 14:13
@@ -176,6 +177,8 @@ def format_timedelta(t, timedelta_format=None):

def format_item(x, timedelta_format=None, quote_strings=True):
"""Returns a succinct summary of an object as a string"""
if isinstance(x, PandasExtensionArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add __str__ to PandasExtensionArray instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what you mean? The point here is to bypass PandasExtensionArray here because its __repr__ is PandasExtensionArray(array=[...]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK right, can you add that statement as a comment please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants