Skip to content

IndexError when calling load() on netCDF file accessed via opendap #4353

Open
@spencerahill

Description

@spencerahill

What happened:

A file that isn't obviously malformed or corrupted that I try to load using opendap generates an esoteric IndexError from dask.

What you expected to happen:

For the file to load successfully.

Minimal Complete Verifiable Example:

url = "http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.NCEP-CFSv2/.HINDCAST/.MONTHLY/.tref/dods"
ds_test = xr.open_dataset(url, decode_times=False)
print(ds_test)
ds_test.load()

This yields the following:

<xarray.Dataset>
Dimensions:  (L: 10, M: 24, S: 348, X: 360, Y: 181)
Coordinates:
  * M        (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0
  * X        (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
  * L        (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
  * S        (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0
  * Y        (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
    tref     (S, L, M, Y, X) float32 ...
Attributes:
    Conventions:  IRIDL

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
     84                 original_array = self.get_array(needs_lock=False)
---> 85                 array = getitem(original_array, key)
     86         except IndexError:

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/common.py in robust_getitem(array, key, catch, max_retries, initial_delay)
     53         try:
---> 54             return array[key]
     55         except catch:

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()

IndexError: index exceeds dimension bounds

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-9-23bd037d898e> in <module>
      2 ds_test = xr.open_dataset(url, decode_times=False)
      3 print(ds_test)
----> 4 ds_test.load()

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/dataset.py in load(self, **kwargs)
    664         for k, v in self.variables.items():
    665             if k not in lazy_data:
--> 666                 v.load()
    667 
    668         return self

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/variable.py in load(self, **kwargs)
    379             self._data = as_compatible_data(self._data.compute(**kwargs))
    380         elif not hasattr(self._data, "__array_function__"):
--> 381             self._data = np.asarray(self._data)
    382         return self
    383 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    675 
    676     def __array__(self, dtype=None):
--> 677         self._ensure_cached()
    678         return np.asarray(self.array, dtype=dtype)
    679 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    672     def _ensure_cached(self):
    673         if not isinstance(self.array, NumpyIndexingAdapter):
--> 674             self.array = NumpyIndexingAdapter(np.asarray(self.array))
    675 
    676     def __array__(self, dtype=None):

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    651 
    652     def __array__(self, dtype=None):
--> 653         return np.asarray(self.array, dtype=dtype)
    654 
    655     def __getitem__(self, key):

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    555     def __array__(self, dtype=None):
    556         array = as_indexable(self.array)
--> 557         return np.asarray(array[self.key], dtype=None)
    558 
    559     def transpose(self, order):

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
     71     def __getitem__(self, key):
     72         return indexing.explicit_indexing_adapter(
---> 73             key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     74         )
     75 

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    835     """
    836     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 837     result = raw_indexing_method(raw_key.tuple)
    838     if numpy_indices.tuple:
    839         # index the loaded np.ndarray

~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
     93                 "your data into memory first by calling .load()."
     94             )
---> 95             raise IndexError(msg)
     96         return array
     97 

IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

Anything else we need to know?:

There's something specific to this file for sure that's causing this, as I'm able to load (for expediency, a subset of) the file from the xarray docs section on opendap successfully:

url = "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/.tdmean/[X+]average/dods"
ds_test = xr.open_dataset(url, decode_times=False)
print(ds_test)

<xarray.Dataset>
Dimensions:  (T: 1420, Y: 621)
Coordinates:
  * Y        (Y) float32 49.916668 49.875 49.833336 ... 24.125 24.083334
  * T        (T) float32 -779.5 -778.5 -777.5 -776.5 ... 636.5 637.5 638.5 639.5
Data variables:
    tdmean   (T, Y) float64 ...
Attributes:
    Conventions:  IRIDL

And then calling ds_test.load() works just fine.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.14.4.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.0
pandas: 1.1.0
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: installed
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.2
iris: None
bottleneck: None
dask: 2.20.0
distributed: 2.20.0
matplotlib: 3.2.1
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
pint: None
setuptools: 49.6.0.post20200814
pip: 20.2.2
conda: None
pytest: None
IPython: 7.8.0
sphinx: None
None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions