intermittent RuntimeError: NetCDF: HDF #2038

nilodna · 2025-06-17T11:15:16Z

nilodna
Jun 17, 2025

Question

Dear all,

I'm running an experiment using the from_nemo() method with multiple NetCDF files as input. For the most part, it works well. I've been able to run several shorter simulations successfully. However, when I attempt longer experiments, I occasionally encounter the error below.

To troubleshoot, I re-downloaded the hydrodynamic input files (based on #1029 ) and checked if xarray is able to open all files being listed with glob on the code below (using both xr.open_dataset and xr.open_mfdataset), but the issue persists. What's particularly puzzling is that a 20-day experiment might run fine once, then fail with the same exact setup if I rerun it.

Unfortunately, I haven't been able to reproduce this in a minimal example. That said, I'm including both my full script and the error log in the hope that they'll provide enough context for debugging or pointing me in the right direction.

Digging around, it might be something related to the computeTimeChunk from xarray (version 2025.3.1), but I haven't found what is causing it.

Thank you in advance for any help!

Supporting code/error messages

from glob import glob
from datetime import datetime, timedelta
import numpy as np
import parcels

def delete_particle(particle, fieldset, time):
    particle.delete()

def random_pset(fieldset=None, lon_range=(-48, -44), lat_range=(3,6),
                npart=100):
  """ """
  return parcels.ParticleSet.from_list(
      fieldset=fieldset,
      pclass=parcels.ScipyParticle,
      lon=np.random.uniform(*lon_range, size=(npart,)),
      lat=np.random.uniform(*lat_range, size=(npart,)),
      time=np.zeros(shape=(npart,)),
  )

# general settings
data_path = '/home/nilodna/postdoc/data/glob16'
mesh_mask = f'{data_path}/GLOB16L98_mesh_mask_atlantic.nc'

# simulation_start should match with the available time span on filenames
simulation_start = datetime(2021, 9, 10, 12, 0, 0)
random_test = True

ufiles = sorted(glob(f"{data_path}/ROMEO.01_1d_uo_2021*.nc"))
vfiles = sorted(glob(f"{data_path}/ROMEO.01_1d_vo_2021*.nc"))
wfiles = sorted(glob(f"{data_path}/ROMEO.01_1d_wo_2021*.nc"))

filenames = {'U': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': ufiles[0], 'data': ufiles},
             'V': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': ufiles[0], 'data': vfiles},
             'W': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': ufiles[0], 'data': wfiles}
             }

variables = {'U': 'uo',
             'V': 'vo',
             'W': 'wo'
             }

dimensions = {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthu', 'time': 'time_counter'}

fieldset = parcels.FieldSet.from_nemo(filenames, variables, dimensions, 
                                      indices={
                                        'lon': [0, 1800],
                                        'lat': [1000, 3000]
                                      },
                                      chunksize=False,
                                      allow_time_extrapolation=True)

pset = random_pset(fieldset)

kernels = pset.Kernel(parcels.AdvectionRK4_3D)

output_file = pset.ParticleFile(name="Output.zarr", outputdt=timedelta(hours=3))
output_file.metadata["date_created"] = datetime.now().isoformat()

pset.execute(
    kernels,
    runtime=timedelta(days=14),
    dt=timedelta(hours=3),
    output_file=output_file
)

RuntimeError                              Traceback (most recent call last)
Cell In[1], line 68
     65 output_file = pset.ParticleFile(name="output.zarr", outputdt=timedelta(hours=3))
     66 output_file.metadata["date_created"] = datetime.now().isoformat()
---> 68 pset.execute(
     69     kernels,
     70     runtime=timedelta(days=13),
     71     dt=timedelta(hours=3),
     72     output_file=output_file
     73 )

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/parcels/particleset.py:1248, in ParticleSet.execute(self, pyfunc, pyfunc_inter, endtime, runtime, dt, output_file, verbose_progress, postIterationCallbacks, callbackdt, delete_cfiles)
   1245     next_prelease += self.repeatdt * np.sign(dt)
   1247 if time != endtime:
-> 1248     next_input = self.fieldset.computeTimeChunk(time, dt)
   1249 if verbose_progress:
   1250     pbar.update(abs(time - time_at_startofloop))

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/parcels/fieldset.py:1614, in FieldSet.computeTimeChunk(self, time, dt)
   1612         f.filebuffers[0] = None
   1613     f.filebuffers[0] = f.filebuffers[1]
-> 1614     data = f.computeTimeChunk(data, 1)
   1615 else:
   1616     f._loaded_time_indices = [0]

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/parcels/field.py:1481, in Field.computeTimeChunk(self, data, tindex)
   1479 if self.netcdf_engine != "xarray":
   1480     filebuffer.name = filebuffer.parse_name(self.filebuffername)
-> 1481 buffer_data = filebuffer.data
   1482 lib = np if isinstance(buffer_data, np.ndarray) else da
   1483 if len(buffer_data.shape) == 2:

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/parcels/fieldfilebuffer.py:230, in NetcdfFileBuffer.data(self)
    228 @property
    229 def data(self):
--> 230     return self.data_access()

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/parcels/fieldfilebuffer.py:236, in NetcdfFileBuffer.data_access(self)
    234 ti = range(data.shape[0]) if self.ti is None else self.ti
    235 data = self._apply_indices(data, ti)
--> 236 return np.array(data, dtype=self.cast_data_dtype)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/common.py:181, in AbstractArray.__array__(self, dtype, copy)
    179         except TypeError:
    180             copy = False
--> 181 return np.array(self.values, dtype=dtype, copy=copy)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/dataarray.py:823, in DataArray.values(self)
    810 @property
    811 def values(self) -> np.ndarray:
    812     """
    813     The array's data converted to numpy.ndarray.
    814 
   (...)    821     to this array may be reflected in the DataArray as well.
    822     """
--> 823     return self.variable.values

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/variable.py:508, in Variable.values(self)
    505 @property
    506 def values(self) -> np.ndarray:
    507     """The variable's data as a numpy.ndarray"""
--> 508     return _as_array_or_item(self._data)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/variable.py:302, in _as_array_or_item(data)
    288 def _as_array_or_item(data):
    289     """Return the given values as a numpy array, or as an individual item if
    290     it's a 0d datetime64 or timedelta64 array.
    291 
   (...)    300     TODO: remove this (replace with np.asarray) once these issues are fixed
    301     """
--> 302     data = np.asarray(data)
    303     if data.ndim == 0:
    304         kind = data.dtype.kind

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:510, in ExplicitlyIndexed.__array__(self, dtype, copy)
    505 def __array__(
    506     self, dtype: np.typing.DTypeLike = None, /, *, copy: bool | None = None
    507 ) -> np.ndarray:
    508     # Leave casting to an array up to the underlying array type.
    509     if Version(np.__version__) >= Version("2.0.0"):
--> 510         return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
    511     else:
    512         return np.asarray(self.get_duck_array(), dtype=dtype)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:836, in MemoryCachedArray.get_duck_array(self)
    835 def get_duck_array(self):
--> 836     self._ensure_cached()
    837     return self.array.get_duck_array()

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:833, in MemoryCachedArray._ensure_cached(self)
    832 def _ensure_cached(self):
--> 833     self.array = as_indexable(self.array.get_duck_array())

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:790, in CopyOnWriteArray.get_duck_array(self)
    789 def get_duck_array(self):
--> 790     return self.array.get_duck_array()

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:660, in LazilyIndexedArray.get_duck_array(self)
    655 # self.array[self.key] is now a numpy array when
    656 # self.array is a BackendArray subclass
    657 # and self.key is BasicIndexer((slice(None, None, None),))
    658 # so we need the explicit check for ExplicitlyIndexed
    659 if isinstance(array, ExplicitlyIndexed):
--> 660     array = array.get_duck_array()
    661 return _wrap_numpy_scalars(array)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/coding/common.py:76, in _ElementwiseFunctionArray.get_duck_array(self)
     75 def get_duck_array(self):
---> 76     return self.func(self.array.get_duck_array())

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:653, in LazilyIndexedArray.get_duck_array(self)
    649     array = apply_indexer(self.array, self.key)
    650 else:
    651     # If the array is not an ExplicitlyIndexedNDArrayMixin,
    652     # it may wrap a BackendArray so use its __getitem__
--> 653     array = self.array[self.key]
    655 # self.array[self.key] is now a numpy array when
    656 # self.array is a BackendArray subclass
    657 # and self.key is BasicIndexer((slice(None, None, None),))
    658 # so we need the explicit check for ExplicitlyIndexed
    659 if isinstance(array, ExplicitlyIndexed):

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/backends/netCDF4_.py:103, in NetCDF4ArrayWrapper.__getitem__(self, key)
    102 def __getitem__(self, key):
--> 103     return indexing.explicit_indexing_adapter(
    104         key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
    105     )

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/core/indexing.py:1014, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    992 """Support explicit indexing by delegating to a raw indexing method.
    993 
    994 Outer and/or vectorized indexers are supported by indexing a second time
   (...)   1011 Indexing result, in the form of a duck numpy-array.
   1012 """
   1013 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1014 result = raw_indexing_method(raw_key.tuple)
   1015 if numpy_indices.tuple:
   1016     # index the loaded duck array
   1017     indexable = as_indexable(result)

File ~/miniconda3/envs/parcels/lib/python3.13/site-packages/xarray/backends/netCDF4_.py:116, in NetCDF4ArrayWrapper._getitem(self, key)
    114     with self.datastore.lock:
    115         original_array = self.get_array(needs_lock=False)
--> 116         array = getitem(original_array, key)
    117 except IndexError as err:
    118     # Catch IndexError in netCDF4 and return a more informative
    119     # error message.  This is most often called when an unsorted
    120     # indexer is used before the data is loaded from disk.
    121     msg = (
    122         "The indexing operation you are attempting to perform "
    123         "is not valid on netCDF4.Variable object. Try loading "
    124         "your data into memory first by calling .load()."
    125     )

File src/netCDF4/_netCDF4.pyx:5079, in netCDF4._netCDF4.Variable.__getitem__()

File src/netCDF4/_netCDF4.pyx:6051, in netCDF4._netCDF4.Variable._get()

File src/netCDF4/_netCDF4.pyx:2164, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: HDF error

Answered by VeckoTheGecko

Jul 1, 2025

was very helpful in identifying a few corrupt files (even though it took me a while to adjust a few things on my end)

How did you deal with the corrupt files? Did you remove them from the simulation/otherwise fix them/re-download them?

the HDF error came back to haunt me. If I restart the computer, it will probably run again, but the problem will likely occur the next time I run the model.

I'm not sure why the problem is intermittent, but it sounds like the problem is with corrupt data- this is something that your data provider would be able to help you with... Have you tried re-downloading the data? Discussing with the provider to check if they're hosting corrupt data?

This problem i…

View full answer

VeckoTheGecko · 2025-06-17T11:51:14Z

VeckoTheGecko
Jun 17, 2025
Maintainer

This error looks very similar to pydata/xarray#4050 . Perhaps your file is corrupted? When you were loading your datasets in xarray, were you loading them into memory? (by default with large datasets xarray load Dask arrays, which are lazily executed). The following code should be close to what you want to try.

# I haven't tested this code, but should work...
import xarray as xr
from itertools import pairwise

data_path = '/home/nilodna/postdoc/data/glob16'
mesh_mask = f'{data_path}/GLOB16L98_mesh_mask_atlantic.nc'

ds_mesh = xr.open_dataset(f'{data_path}/GLOB16L98_mesh_mask_atlantic.nc')
ds_u = xr.open_mfdataset(f"{data_path}/ROMEO.01_1d_uo_2021*.nc")
ds_v = xr.open_mfdataset(f"{data_path}/ROMEO.01_1d_vo_2021*.nc")
ds_w = xr.open_mfdataset(f"{data_path}/ROMEO.01_1d_wo_2021*.nc")

_ = ds_mesh.load()  # load and discard from memory

# load arrays into memory
try:
    for ds_full in [ds_u, ds_v, ds_w]:
        
        # Load individual slices along the time dimension
        for start, end in pairwise(range(ds_full.time_counter.size, step=3)):
            ds = ds_full.isel(time = slice(start, end))
            _ = ds.load() # load and discard from memory
except RuntimeError as e:
    e.add_note(f"Error encountered on:\n{ds}")
    raise e

Hopefully that helps. My only idea at the moment is it being a data issue - I haven't seen this before.

What's particularly puzzling is that a 20-day experiment might run fine once, then fail with the same exact setup if I rerun it.

Really not sure why this is, or why there is flakiness here. Hopefully the code above sheds some light on what the problem is.

Thanks for the error log, code, and consideration for a minimal reproducer! Helps quite a bit to help debug

2 replies

nilodna Jul 1, 2025
Author

Hi @VeckoTheGecko, thanks for your suggestion.

I didn't remember the default mode of open_mfdataset, so your code was very helpful in identifying a few corrupt files (even though it took me a while to adjust a few things on my end).

Now, I can run a 55-day-long experiment. However, the intermittent error still occurs. For instance, right now my experiment runs just fine, but when I tried to run it a second time, the HDF error came back to haunt me. If I restart the computer, it will probably run again, but the problem will likely occur the next time I run the model.

VeckoTheGecko Jul 1, 2025
Maintainer

was very helpful in identifying a few corrupt files (even though it took me a while to adjust a few things on my end)

How did you deal with the corrupt files? Did you remove them from the simulation/otherwise fix them/re-download them?

the HDF error came back to haunt me. If I restart the computer, it will probably run again, but the problem will likely occur the next time I run the model.

I'm not sure why the problem is intermittent, but it sounds like the problem is with corrupt data- this is something that your data provider would be able to help you with... Have you tried re-downloading the data? Discussing with the provider to check if they're hosting corrupt data?

This problem isn't really something we can solve on the Parcels side - I won't be able to be much more help.

Answer selected by VeckoTheGecko

nilodna · 2025-08-08T09:59:54Z

nilodna
Aug 8, 2025
Author

Hi @VeckoTheGecko,

I'm coming back to this just to update with what I've found regarding the intermittent problem. The problem indeed wasn't with Parcels and it took me a few months to figure out what was going on.

I am not completely sure about this and I don't know how to be sure about it, but here it goes: I found this comment from Deepak Cherian, mentioning the "bad disk" possibility and dug a bit on this. What I've found was that HDF has issues saving netcdf files on SSD storages (here), which sometimes corrupt the file during the writing or reading.

I managed that by downloading my entire dataset on a HDD (using your code to double check whether the file was corrupted or not) and run the model, with the same setup from before. Surprisingly no RuntimeError poped up! I then rsync these files from the HDD to a SDD (everything on the same machine), and run the model: the error came back. I then tried to run the model with the files from the HDD, and the error was there. We then started to believe on a second problem: the cache management of xarray.

The thing is the xarray saves files on cache to speed up new reading of the same file, perhaps using checksum to identify the files but I'm not sure about this. This was happening even when using different files from different partitions. By forcing a clean up of the cache in the unix system, we were able to run the model again without having to reboot the machine.

So, I'm not entirely sure about all this, but the thing is that I am running the model on my old laptop (with hdd and unix system) without any RuntimeError for a while now. On my new laptop, with SSD, the model does not run, similarly to the server I was working on. So I'm pretty confident that it might be something to do with this sdd/hdd thing.

My problem was solved, and I hope this information might be helpful for future users.

Thank you very much for your help.

Best,

1 reply

VeckoTheGecko Aug 8, 2025
Maintainer

What a deep dive ! Thanks for updating the discussion here with your findings. I think indeed there isn't anything that we can do on Parcels'/Xarray's end to help with this - and it is just a HDF corruption issue that users and data providers will need to be aware of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

intermittent RuntimeError: NetCDF: HDF #2038

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

intermittent RuntimeError: NetCDF: HDF #2038

Uh oh!

nilodna Jun 17, 2025

Question

Question

Supporting code/error messages

Replies: 2 comments · 3 replies

Uh oh!

Uh oh!

VeckoTheGecko Jun 17, 2025 Maintainer

Uh oh!

nilodna Jul 1, 2025 Author

Uh oh!

VeckoTheGecko Jul 1, 2025 Maintainer

Uh oh!

nilodna Aug 8, 2025 Author

Uh oh!

Uh oh!

VeckoTheGecko Aug 8, 2025 Maintainer

nilodna
Jun 17, 2025

Replies: 2 comments 3 replies

VeckoTheGecko
Jun 17, 2025
Maintainer

nilodna Jul 1, 2025
Author

VeckoTheGecko Jul 1, 2025
Maintainer

nilodna
Aug 8, 2025
Author

VeckoTheGecko Aug 8, 2025
Maintainer