Why does xr.apply_ufunc support numpy/dask.arrays?

### What is your issue?

@keewis pointed out that it's weird that [`xarray.apply_ufunc`](https://docs.xarray.dev/en/stable/generated/xarray.apply_ufunc.html) supports passing numpy/dask arrays directly, and I'm inclined to agree. I don't understand why we do, and think we should consider removing that feature.

Two arguments in favour of removing it:

1) **It exposes users to transposition errors**

Consider this example:

```python
In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: arr = np.arange(12).reshape(3, 4)

In [4]: def mean(obj, dim):
   ...:     # note: apply always moves core dimensions to the end
   ...:     return xr.apply_ufunc(
   ...:         np.mean, obj, input_core_dims=[[dim]], kwargs={"axis": -1}
   ...:     )
   ...: 

In [5]: mean(arr, dim='time')
Out[5]: array([1.5, 5.5, 9.5])

In [6]: mean(arr.T, dim='time')
Out[6]: array([4., 5., 6., 7.])
```

Transposing the input leads to a different result, with the value of the `dim` kwarg effectively ignored. This kind of error is what xarray code is supposed to prevent by design.

2) **There is an alternative input pattern that doesn't require accepting bare arrays**

Instead, any numpy/dask array can just be wrapped up into an xarray `Variable`/`NamedArray` before passing it to `apply_ufunc`.

```python
In [7]: from xarray.core.variable import Variable

In [8]: var = Variable(data=arr, dims=['time', 'space'])

In [9]: mean(var, dim='time')
Out[9]: 
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])

In [10]: mean(var.T, dim='time')
Out[10]: 
<xarray.Variable (space: 4)> Size: 32B
array([4., 5., 6., 7.])
```

This now guards against the transposition error, and puts the onus on the user to be clear about which axes of their array correspond to which dimension.

With `Variable`/`NamedArray` as public API, this latter pattern can handle every case that passing bare arrays in could.

I suggest we deprecate accepting bare arrays in favour of having users wrap them in `Variable`/`NamedArray`/`DataArray` objects instead.

(Note 1: We also accept raw scalars, but this doesn't expose anyone to transposition errors.)

(Note 2: In a quick scan of the `apply_ufunc` docstring, the docs on it in `computation.rst`, and the extensive guide that @dcherian wrote in the xarray tutorial repository, I can't see any examples that actually pass bare arrays to `apply_ufunc`.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why does xr.apply_ufunc support numpy/dask.arrays? #8995

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Why does xr.apply_ufunc support numpy/dask.arrays? #8995

Description

What is your issue?

Activity

gmoutso commented on Jun 11, 2024

TomNicholas commented on Jun 12, 2024

gmoutso commented on Jun 28, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions