Skip to content

Investigate if it is possible to avoid reading all coordinate chunks when opening a dataset with xarray #34

Closed
@abarciauskas-bgse

Description

@abarciauskas-bgse

Right now, xarray's open_zarr and open_dataset are significantly slower when coordinates are chunked as all coordinate chunks result in a request to S3.

Is it possible to either

  1. create a datastore where coordinates are not chunked
  2. open a dataset which has chunked coordinates but not fetch all the chunks.

Note: I tried decode_coords=False and the same issue results.

Related:
pydata/xarray#6633
pydata/xarray#7368
https://discourse.pangeo.io/t/puzzling-s3-xarray-open-zarr-latency/1074/11

From @maxrjones

a case in which the data are chunked along a dimension but the coordinates are not chunked. This is what we did for the CMIP6-downscaling pyramids to fetch the coordinates with one request but only fetch specific chunks of the data, e.g.,

import zarr
store = zarr.open("s3://carbonplan-cmip6/flow-outputs/results/0.1.9/pyramid/01df7816c64b3999/0/", mode="r")
print(f'tasmin chunks: {store["tasmin"].chunks}')
print(f'time chunks: {store["time"].chunks}')

tasmin chunks: (25, 128, 128)
time chunks: (1020,)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions