-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Methods to_zarr and to_netcdf behave inconsistently for chunked dataset. The latter does not preserve existing chunk information, the chunks must be specified within the encoding dictionary.
What did you expect to happen?
I expected the behaviour to be consistent for for all to_XXX() methods.
Minimal Complete Verifiable Example
import xarray as xr
import dask.array as da
rng = da.random.RandomState()
shape = (20, 20)
chunks = [10, 10]
dims = ["x", "y"]
z = rng.standard_normal(shape, chunks=chunks)
ds = xr.DataArray(z, dims=dims, name="z").to_dataset()
ds.chunks
# This one is rechunked
ds.to_netcdf("/tmp/test1.nc", encoding={"z": {"chunksizes": (5, 5)}})
# This one is not rechunked, also original chunks are lost
ds.chunk({"x": 5, "y": 5}).to_netcdf("/tmp/test2.nc")
# This one is rechunked
ds.chunk({"x": 5, "y": 5}).to_zarr("/tmp/test2", mode="w")
Frozen({'x': (10, 10), 'y': (10, 10)})
<xarray.backends.zarr.ZarrStore at 0x7f3669f1af80>
xr.open_mfdataset("/tmp/test1.nc").chunks
xr.open_mfdataset("/tmp/test2.nc").chunks
xr.open_mfdataset("/tmp/test2", engine="zarr").chunks
Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)})
Frozen({'x': (20,), 'y': (20,)})
Frozen({'x': (5, 5, 5, 5), 'y': (5, 5, 5, 5)})MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
I did get the same results for h5netcdf and scipy backends, so I am not sure whether this is a bug or not.
The above code is a modified version of #2198.
A suggestion: the documentation provides only examples of encoding styles. It would be helpful to provide links to a full specification.
Environment
Details
INSTALLED VERSIONS ------------------ commit: None python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 6.5.5-1-MANJARO machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2xarray: 2023.10.1
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: None
h5netcdf: 1.2.0
h5py: 3.10.0
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.10.0
distributed: 2023.10.0
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: None
numbagg: 0.5.1
fsspec: 2023.10.0
cupy: None
pint: None
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None