Skip to content

If "chunks=None" is set in open_mfdataset, it is changed to "chunks={}" before being passed to "_dataset_from_backend_dataset" #7792

@timcera

Description

@timcera

What happened?

Using the grib2io engine, but have to use on a system that currently doesn't allow dask to be installed. Looking through the code I think that setting "chunks=None" would work to not use dask, but on

open_kwargs = dict(engine=engine, chunks=chunks or {}, **kwargs)
"chunks=None" is converted to "chunks={}".

This means that at this test

if chunks is None:
for "chunks is None" will never be true and the dask code path will always run.

The example below uses the rasterio engine because I could open publicly available files from S3. The rasterio engine gives the same error as the grib2io engine.

What did you expect to happen?

Expected open_mfdataset to work without dask installed.

Minimal Complete Verifiable Example

# Have to create an environment that doesn't include dask.  For example:
#     conda create -n xarrayenv -c conda-forge xarray rioxarray
#     conda activate xarrayenv

import xarray as xr
import os

os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

ds = xr.open_mfdataset(
    [
        "/vsis3/noaa-nbm-grib2-pds/blend.20230401/02/core/blend.t02z.core.f003.co.grib2",
        "/vsis3/noaa-nbm-grib2-pds/blend.20230401/02/core/blend.t02z.core.f004.co.grib2",
    ],
    engine="rasterio",
    chunks=None,
)

# Traceback (most recent call last):                                                                                                                                         
#   File "/home/tim/test.py", line 6, in <module>                                                                                                                            
#     ds = xr.open_mfdataset(                                                                                                                                                
#          ^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 982, in open_mfdataset
#     datasets = [open_(p, **open_kwargs) for p in paths]
#                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 982, in <listcomp>
#     datasets = [open_(p, **open_kwargs) for p in paths]
#                 ^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 531, in open_dataset
#     ds = _dataset_from_backend_dataset(
#          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 342, in _dataset_from_backend_dataset
#     ds = _chunk_ds(
#          ^^^^^^^^^^
#   File "/home/tim/anaconda3/envs/xarray/lib/python3.11/site-packages/xarray/backends/api.py", line 302, in _chunk_ds
#     from dask.base import tokenize
# ModuleNotFoundError: No module named 'dask'

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Details

/home/tim/anaconda3/envs/xarrayenv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-70-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions