Remote access patterns using xarray.

I'm not sure if this will fit in the upcoming (potential) SciPy tutorial or somewhere else, I think it could be helpful to include a mini-guide on access patterns to remote storage. I think that one of the key strengths of xarray is in a way, a weakness. I'm thinking about how powerful the abstractions are when it comes to open a multi-file datasets and how this could hide the nuances of different back-end storage types.

When a new user sees this and they get a data cube, it's like magic!

```python
ds = xr.open_dataset(reference, engine="zarr")
```

and although this is the cloud-native way, a considerable amount of data is still in archival formats or available through a service like Opendap. In an ideal world, users shouldn't care in which format/location their data is, but I've run into multiple instances where is not that xarray is not doing its job but the data is in HDF on a slow server across the next continent.

Sometimes there are workarounds, from using different sources(e.g. Planetary Computer, GEE) that serve the same data but on a cloud optimized format, to the use of Kerchunk or using clever caching strategies. I feel that some of these topics are buried in threads in Github and not necessarily exposed in the documentation.

The idea would be to quickly illustrate, what xarray would do if I have files of type X and this access pattern:

```python
file_set = [fsspec.open(f) for f in files]
ds = xr.open_mfdataset(file_set) 
```

What would happen if my files are HDF4, NetCDF, HDF5, what's the step 1, 2, 3...  can we make it faster? how?
What if the data is behind OPeNDAP? etc

I also wonder if this information is already out there in the docs and perhaps just needs to be compiled into a single notebook, I volunteer to start one if is not.
 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote access patterns using xarray. #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remote access patterns using xarray. #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions