Skip to content

misleading error message for attempting to load non existing file #7435

@kathoef

Description

@kathoef

What happened?

While trying to load a .h5 file using load_dataset, I accidently specified the wrong path. Instead of getting a "no such file or directory" error, however, I got a "did not find a match in any of xarray's currently installed IO backends ['netcdf4']" error. It took some time to find out that the problem was actually with the path, and not with my installed software libraries.

What did you expect to happen?

I would expect that load_dataset informs me with a "no such file or directory" error, and not with something refering to the IO backends, if I attempt to open a file, that is clearly not existing. For .nc files this seems to work, see below.

Minimal Complete Verifiable Example

import xarray
xarray.load_dataset('not-existing-file.h5')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
    Complete example — the example is self-contained, including all data and the text of any traceback.
    Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
    New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 xarray.load_dataset('not-existing-file.h5')

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/api.py:279, in load_dataset(filename_or_obj, **kwargs)
    276 if "cache" in kwargs:
    277     raise TypeError("cache has no effect in this context")
--> 279 with open_dataset(filename_or_obj, **kwargs) as ds:
    280     return ds.load()

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/api.py:524, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    521     kwargs.update(backend_kwargs)
    523 if engine is None:
--> 524     engine = plugins.guess_engine(filename_or_obj)
    526 backend = plugins.get_backend(engine)
    528 decoders = _resolve_decoders_kwargs(
    529     decode_cf,
    530     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    536     decode_coords=decode_coords,
    537 )

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/plugins.py:177, in guess_engine(store_spec)
    169 else:
    170     error_msg = (
    171         "found the following matches with the input file in xarray's IO "
    172         f"backends: {compatible_engines}. But their dependencies may not be installed, see:\n"
    173         "https://docs.xarray.dev/en/stable/user-guide/io.html \n"
    174         "https://docs.xarray.dev/en/stable/getting-started-guide/installing.html"
    175     )
--> 177 raise ValueError(error_msg)

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html

Anything else we need to know?

It should be noted that the .h5 file is a working netcdf file, that can be loaded and used w/o installing further libraries if the path is correctly specified. Interestingly, for attempting to load a non existing .nc file, the load_dataset error message correctly says "FileNotFoundError: [Errno 2] No such file or directory: b'/home/jovyan/my_materials/not-existing-file.nc'".

Example code,

import xarray
xarray.load_dataset('not-existing-file.nc')

Error message,

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/file_manager.py:209, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    208 try:
--> 209     file = self._cache[self._key]
    210 except KeyError:

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/lru_cache.py:55, in LRUCache.__getitem__(self, key)
     54 with self._lock:
---> 55     value = self._cache[key]
     56     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/jovyan/my_materials/not-existing-file.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'ae3bbd85-042b-46e1-97ae-f8d523bb578a']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[11], line 1
----> 1 xarray.load_dataset('not-existing-file.nc')

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/api.py:279, in load_dataset(filename_or_obj, **kwargs)
    276 if "cache" in kwargs:
    277     raise TypeError("cache has no effect in this context")
--> 279 with open_dataset(filename_or_obj, **kwargs) as ds:
    280     return ds.load()

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/api.py:540, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    528 decoders = _resolve_decoders_kwargs(
    529     decode_cf,
    530     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    536     decode_coords=decode_coords,
    537 )
    539 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 540 backend_ds = backend.open_dataset(
    541     filename_or_obj,
    542     drop_variables=drop_variables,
    543     **decoders,
    544     **kwargs,
    545 )
    546 ds = _dataset_from_backend_dataset(
    547     backend_ds,
    548     filename_or_obj,
   (...)
    556     **kwargs,
    557 )
    558 return ds

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:572, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    551 def open_dataset(
    552     self,
    553     filename_or_obj,
   (...)
    568     autoclose=False,
    569 ):
    571     filename_or_obj = _normalize_path(filename_or_obj)
--> 572     store = NetCDF4DataStore.open(
    573         filename_or_obj,
    574         mode=mode,
    575         format=format,
    576         group=group,
    577         clobber=clobber,
    578         diskless=diskless,
    579         persist=persist,
    580         lock=lock,
    581         autoclose=autoclose,
    582     )
    584     store_entrypoint = StoreBackendEntrypoint()
    585     with close_on_error(store):

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:376, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    370 kwargs = dict(
    371     clobber=clobber, diskless=diskless, persist=persist, format=format
    372 )
    373 manager = CachingFileManager(
    374     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    375 )
--> 376 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:323, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    321 self._group = group
    322 self._mode = mode
--> 323 self.format = self.ds.data_model
    324 self._filename = self.ds.filepath()
    325 self.is_remote = is_remote_uri(self._filename)

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:385, in NetCDF4DataStore.ds(self)
    383 @property
    384 def ds(self):
--> 385     return self._acquire()

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:379, in NetCDF4DataStore._acquire(self, needs_lock)
    378 def _acquire(self, needs_lock=True):
--> 379     with self._manager.acquire_context(needs_lock) as root:
    380         ds = _nc4_require_group(root, self._group, self._mode)
    381     return ds

File ~/my-pykernel/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/file_manager.py:197, in CachingFileManager.acquire_context(self, needs_lock)
    194 @contextlib.contextmanager
    195 def acquire_context(self, needs_lock=True):
    196     """Context manager for acquiring a file."""
--> 197     file, cached = self._acquire_with_cache_info(needs_lock)
    198     try:
    199         yield file

File ~/my-pykernel/lib/python3.11/site-packages/xarray/backends/file_manager.py:215, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    213     kwargs = kwargs.copy()
    214     kwargs["mode"] = self._mode
--> 215 file = self._opener(*self._args, **kwargs)
    216 if self._mode == "w":
    217     # ensure file doesn't get overridden when opened again
    218     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2463, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2026, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: b'/home/jovyan/my_materials/not-existing-file.nc'

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:24:40) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-136-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.12.0
pandas: 1.5.2
numpy: 1.24.1
scipy: None
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: 8.8.0
sphinx: None

Activity

added
needs triageIssue that has not been reviewed by xarray team member
on Jan 13, 2023
slevang

slevang commented on Jan 23, 2023

@slevang
Contributor

You do get a FileNotFoundError if you explicitly specify an engine with xarray.load_dataset('not-existing-file.h5', engine='h5netcdf').

It looks like neither NetCDF4BackendEntrypoint or H5netcdfBackendEntrypoint include .h5 in the set of openable extensions they handle in guess_can_open, but they will work if they can peer into the file and detect a valid .h5. Probably some good reason for this.

dcherian

dcherian commented on Jan 23, 2023

@dcherian
Contributor

I think we should update the error to suggest trying with an explicit engine kwarg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @dcherian@TomNicholas@kathoef@slevang

        Issue actions

          misleading error message for attempting to load non existing file · Issue #7435 · pydata/xarray