Closed
Description
Right now, xarray's open_zarr
and open_dataset
are significantly slower when coordinates are chunked as all coordinate chunks result in a request to S3.
Is it possible to either
- create a datastore where coordinates are not chunked
- open a dataset which has chunked coordinates but not fetch all the chunks.
Note: I tried decode_coords=False
and the same issue results.
Related:
pydata/xarray#6633
pydata/xarray#7368
https://discourse.pangeo.io/t/puzzling-s3-xarray-open-zarr-latency/1074/11
From @maxrjones
a case in which the data are chunked along a dimension but the coordinates are not chunked. This is what we did for the CMIP6-downscaling pyramids to fetch the coordinates with one request but only fetch specific chunks of the data, e.g.,
import zarr
store = zarr.open("s3://carbonplan-cmip6/flow-outputs/results/0.1.9/pyramid/01df7816c64b3999/0/", mode="r")
print(f'tasmin chunks: {store["tasmin"].chunks}')
print(f'time chunks: {store["time"].chunks}')
tasmin chunks: (25, 128, 128)
time chunks: (1020,)
Metadata
Metadata
Assignees
Labels
No labels