Description
What happened:
A file that isn't obviously malformed or corrupted that I try to load using opendap generates an esoteric IndexError from dask.
What you expected to happen:
For the file to load successfully.
Minimal Complete Verifiable Example:
url = "http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.NCEP-CFSv2/.HINDCAST/.MONTHLY/.tref/dods"
ds_test = xr.open_dataset(url, decode_times=False)
print(ds_test)
ds_test.load()
This yields the following:
<xarray.Dataset>
Dimensions: (L: 10, M: 24, S: 348, X: 360, Y: 181)
Coordinates:
* M (M) float32 1.0 2.0 3.0 4.0 5.0 6.0 ... 20.0 21.0 22.0 23.0 24.0
* X (X) float32 0.0 1.0 2.0 3.0 4.0 ... 355.0 356.0 357.0 358.0 359.0
* L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
* S (S) float32 264.0 265.0 266.0 267.0 ... 608.0 609.0 610.0 611.0
* Y (Y) float32 -90.0 -89.0 -88.0 -87.0 -86.0 ... 87.0 88.0 89.0 90.0
Data variables:
tref (S, L, M, Y, X) float32 ...
Attributes:
Conventions: IRIDL
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
84 original_array = self.get_array(needs_lock=False)
---> 85 array = getitem(original_array, key)
86 except IndexError:
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/common.py in robust_getitem(array, key, catch, max_retries, initial_delay)
53 try:
---> 54 return array[key]
55 except catch:
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()
IndexError: index exceeds dimension bounds
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
<ipython-input-9-23bd037d898e> in <module>
2 ds_test = xr.open_dataset(url, decode_times=False)
3 print(ds_test)
----> 4 ds_test.load()
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/dataset.py in load(self, **kwargs)
664 for k, v in self.variables.items():
665 if k not in lazy_data:
--> 666 v.load()
667
668 return self
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/variable.py in load(self, **kwargs)
379 self._data = as_compatible_data(self._data.compute(**kwargs))
380 elif not hasattr(self._data, "__array_function__"):
--> 381 self._data = np.asarray(self._data)
382 return self
383
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
675
676 def __array__(self, dtype=None):
--> 677 self._ensure_cached()
678 return np.asarray(self.array, dtype=dtype)
679
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in _ensure_cached(self)
672 def _ensure_cached(self):
673 if not isinstance(self.array, NumpyIndexingAdapter):
--> 674 self.array = NumpyIndexingAdapter(np.asarray(self.array))
675
676 def __array__(self, dtype=None):
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
651
652 def __array__(self, dtype=None):
--> 653 return np.asarray(self.array, dtype=dtype)
654
655 def __getitem__(self, key):
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in __array__(self, dtype)
555 def __array__(self, dtype=None):
556 array = as_indexable(self.array)
--> 557 return np.asarray(array[self.key], dtype=None)
558
559 def transpose(self, order):
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
71 def __getitem__(self, key):
72 return indexing.explicit_indexing_adapter(
---> 73 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
74 )
75
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
835 """
836 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 837 result = raw_indexing_method(raw_key.tuple)
838 if numpy_indices.tuple:
839 # index the loaded np.ndarray
~/miniconda3/envs/ensinoo/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
93 "your data into memory first by calling .load()."
94 )
---> 95 raise IndexError(msg)
96 return array
97
IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().
Anything else we need to know?:
There's something specific to this file for sure that's causing this, as I'm able to load (for expediency, a subset of) the file from the xarray docs section on opendap successfully:
url = "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/.tdmean/[X+]average/dods"
ds_test = xr.open_dataset(url, decode_times=False)
print(ds_test)
<xarray.Dataset>
Dimensions: (T: 1420, Y: 621)
Coordinates:
* Y (Y) float32 49.916668 49.875 49.833336 ... 24.125 24.083334
* T (T) float32 -779.5 -778.5 -777.5 -776.5 ... 636.5 637.5 638.5 639.5
Data variables:
tdmean (T, Y) float64 ...
Attributes:
Conventions: IRIDL
And then calling ds_test.load()
works just fine.
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.14.4.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.0
pandas: 1.1.0
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: installed
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.2
iris: None
bottleneck: None
dask: 2.20.0
distributed: 2.20.0
matplotlib: 3.2.1
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
pint: None
setuptools: 49.6.0.post20200814
pip: 20.2.2
conda: None
pytest: None
IPython: 7.8.0
sphinx: None
None