Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent use of xarray's open methods #374

Open
malmans2 opened this issue Apr 30, 2024 · 2 comments
Open

Inconsistent use of xarray's open methods #374

malmans2 opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@malmans2
Copy link
Contributor

malmans2 commented Apr 30, 2024

What happened?

Some backends use xr.open_dataset whereas others use xr.open_mfdataset.

Because of that, our code does not work seamlessly with all datasets.
Asxr.open_mfdataset is more general and implements more functionalities, would it be possible to use it everywhere?

There's also another important downside. The behaviour of xr.open_dataset and xr.open_mfdataset is not identical with single files. For example, xr.open_mfdataset uses dask by default whereas xr.open_dataset does not (you'd have to explicitly pass the argument chunks={}).

What are the steps to reproduce the bug?

import earthkit.data

collection_id = "reanalysis-era5-single-levels"
request = {
    "variable": "2t",
    "product_type": "reanalysis",
    "date": "2012-12-01",
    "time": "12:00",
}
kwargs = {"preprocess": lambda ds: ds**2}

nc = earthkit.data.from_source("cds", collection_id, **request, format="netcdf")
nc.to_xarray(xarray_open_mfdataset_kwargs=kwargs)  # OK

grib = earthkit.data.from_source("cds", collection_id, **request, format="grib")
grib.to_xarray(xarray_open_mfdataset_kwargs=kwargs)
# TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'

Version

0.7.0

Platform (OS and architecture)

Linux eqc-quality-tools.eqc.compute.cci1.ecmwf.int 5.14.0-362.8.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Nov 8 17:36:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[15], line 16
     13 nc.to_xarray(xarray_open_mfdataset_kwargs=kwargs)  # OK
     15 grib = earthkit.data.from_source("cds", collection_id, **request, format="grib")
---> 16 grib.to_xarray(xarray_open_mfdataset_kwargs=kwargs)
     17 # TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'

File /data/common/miniforge3/envs/wp3/lib/python3.11/site-packages/earthkit/data/readers/grib/xarray.py:138, in XarrayMixIn.to_xarray(self, **kwargs)
    125 default.update(self.xarray_open_dataset_kwargs())
    127 xarray_open_dataset_kwargs.update(
    128     Kwargs(
    129         user=user_xarray_open_dataset_kwargs,
   (...)
    135     )
    136 )
--> 138 result = xr.open_dataset(
    139     IndexWrapperForCfGrib(self, ignore_keys=ignore_keys),
    140     **xarray_open_dataset_kwargs,
    141 )
    143 return result

File /data/common/miniforge3/envs/wp3/lib/python3.11/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    561 decoders = _resolve_decoders_kwargs(
    562     decode_cf,
    563     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    569     decode_coords=decode_coords,
    570 )
    572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
    574     filename_or_obj,
    575     drop_variables=drop_variables,
    576     **decoders,
    577     **kwargs,
    578 )
    579 ds = _dataset_from_backend_dataset(
    580     backend_ds,
    581     filename_or_obj,
   (...)
    591     **kwargs,
    592 )
    593 return ds

TypeError: CfGribBackend.open_dataset() got an unexpected keyword argument 'preprocess'

Accompanying data

No response

Organisation

B-Open / CADS-EQC

@malmans2 malmans2 added the bug Something isn't working label Apr 30, 2024
@sandorkertesz
Copy link
Collaborator

@malmans2, thank you for reporting this issue. I agree that using xarray_open_mfdataset consistently would be a good idea. This will be fixed in the next release.

@sandorkertesz sandorkertesz self-assigned this May 7, 2024
@sandorkertesz
Copy link
Collaborator

Also related to this issue is the following comment from @malmans2 in #375:

just wanted to provide more details about the use we are doing as you mentioned that we should not import the reader class and a new method will be added:

if isinstance(earthkit_ds, GRIBReader):
    xr_ds = earthkit_ds.to_xarray(xarray_open_dataset_kwargs={"squeeze": False, "chunks": {}})
elif isinstance(earthkit_ds, CSVReader):
    xr_ds = ds.to_xarray(pandas_read_csv_kwargs=...)
elif ...:
    ...
else:
    xr_ds = earthkit_ds.to_xarray(xarray_open_mfdataset_kwargs=...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants