Skip to content

Conversation

@TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 16, 2025

Adds an .async_load() method to Variable, which works by plumbing async get_duck_array all the way down until it finally gets to the async methods zarr v3 exposes.

Needs a lot of refactoring before it could be merged, but it works.

API:

  • Variable.load_async
  • DataArray.load_async
  • Dataset.load_async
  • DataTree.load_async
  • load_dataset?
  • load_dataarray?

not has_zarr_v3, reason="zarr-python <3 did not support async loading"
)
@pytest.mark.asyncio
async def test_load_async(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this parallels test_load then lets keep this on the base class and use pytest.skip() on the netCDF subclasses. That way it's easy to keep the two in sync. Let's add a comment requesting future contributors to keep the two in sync

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did think about doing this but skipping an inherited method seemed messy.

I made it work in cf1d127, though. Note that the MRO becomes important to get the correctly overridden test to be inherited (so it can be skipped). I think the cleaner solution here would be if we could simply ask the backends whether or not they support async indexing, which is an idea we also discussed for #10579 (comment).

@pytest.mark.asyncio
async def test_lazy_async_indexing(self) -> None:
v = Variable(dims=("x", "y"), data=LazilyIndexedArray(self.d))
await self.check_orthogonal_async_indexing(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but we could combine the sync and async checks in one async function and just use that everywhere in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4f40792, though now I'm a little worried that this pattern

    async def check_orthogonal_indexing(self, v):
        expected = self.d[[8, 3]][:, [2, 1]]

        result = v.isel(x=[8, 3], y=[2, 1])
        assert np.allclose(result, expected)

        result = await v.isel(x=[8, 3], y=[2, 1]).load_async()
        assert np.allclose(result, expected)

might be automatically passing the second assertion by still being in-memory after the first assert?

Copy link
Contributor

@rabernat rabernat Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could parametrize the test over async and non async calls

@pytest.mark.parametrize("use_async", [True, False])
async def check_orthogonal_indexing(self, v, use_async):
        expected = self.d[[8, 3]][:, [2, 1]]

        if use_async:
            result = await v.isel(x=[8, 3], y=[2, 1]).load_async()
        else:
            result = v.isel(x=[8, 3], y=[2, 1])
        assert np.allclose(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use assert not v._in_memory to be really sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the parametrization idea, though the syntax has to be a little messier because you can't parametrize normal functions in pytest. a074a25

@pytest.mark.parametrize("cls_name", ["Variable", "DataArray", "Dataset"])
@pytest.mark.parametrize(
"indexer, method, target_zarr_class",
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏾 👏🏾 👏🏾

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

I left some minor comments that should be easy to address.

@TomNicholas TomNicholas added the plan to merge Final call for comments label Aug 13, 2025
@TomNicholas TomNicholas merged commit 9b1d570 into pydata:main Aug 14, 2025
43 of 47 checks passed
@TomNicholas TomNicholas deleted the async.load branch August 14, 2025 16:03
dcherian added a commit to dhruvak001/xarray that referenced this pull request Aug 24, 2025
* main: (46 commits)
  use the new syntax of ignoring bots (pydata#10668)
  modification methods on `Coordinates` (pydata#10318)
  Silence warnings from test_tutorial.py (pydata#10661)
  test: update write_empty test for zarr 3.1.2 (pydata#10665)
  Bump actions/checkout from 4 to 5 in the actions group (pydata#10652)
  Add load_datatree function (pydata#10649)
  Support compute=False from DataTree.to_netcdf (pydata#10625)
  Fix typos (pydata#10655)
  In case of misconfiguration of dataset.encoding `unlimited_dims` warn instead of raise (pydata#10648)
  fix ``auto_complex`` for ``open_datatree`` (pydata#10632)
  Fix bug indexing with boolean scalars (pydata#10635)
  Improve DataTree typing (pydata#10644)
  Update Cartopy and Iris references (pydata#10645)
  Empty release notes (pydata#10642)
  release notes for v2025.08.0 (pydata#10641)
  Fix `ds.merge` to prevent altering original object depending on join value (pydata#10596)
  Add asynchronous load method (pydata#10327)
  Add DataTree.prune() method              … (pydata#10598)
  Avoid refining parent dimensions in NetCDF files (pydata#10623)
  clarify lazy behaviour and eager loading chunks=None in open_*-functions (pydata#10627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Continuous Integration tools dependencies Pull requests that update a dependency file enhancement io plan to merge Final call for comments run-upstream Run upstream CI topic-backends topic-indexing topic-NamedArray Lightweight version of Variable topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an asynchronous load method?

5 participants