Skip to content

Threading Lock issue with to_netcdf and Dask arrays #4406

Open
@bilelomrani1

Description

@bilelomrani1

I have multiple netCDF files that I process and write to a single output file.

from pathlib import Path

import xarray as xr

DS = xr.open_mfdataset(Path('L3_data/').glob('**/*.nc'),
                       combine='nested',
                       concat_dim='time',
                       decode_times=False,
                       chunks={'time': 200})

DS.to_netcdf('test.nc')

I was not able to reproduce the issue with a simple example, here are the files : L3_data.zip. The issue does not appear systematically but sometimes, .to_netcdf blocks indefinitely. When interrupted manually, the trace is the following:

Traceback (most recent call last):
  File "test_xarray.py", line 11, in <module>
    DS.to_netcdf('test.nc')
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/core/dataset.py", line 1568, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/api.py", line 1090, in to_netcdf
    writes = writer.sync(compute=compute)
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/xarray/backends/common.py", line 204, in sync
    regions=self.regions,
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/array/core.py", line 945, in store
    result.compute(**kwargs)
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/threaded.py", line 84, in get
    **kwargs
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/local.py", line 475, in get_async
    key, res_info, failed = queue_get(queue)
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/site-packages/dask/local.py", line 133, in queue_get
    return q.get()
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/queue.py", line 170, in get
    self.not_empty.wait()
  File "/Users/bilelomrani/opt/miniconda3/envs/s5p/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
KeyboardInterrupt

Environment:

I'm on Mac OS 10.15.6

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:37:09)
[Clang 10.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.16.0
pandas: 1.1.1
numpy: 1.19.1
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.28
cfgrib: None
iris: None
bottleneck: None
dask: 2.25.0
distributed: 2.25.0
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20200814
pip: 20.2.2
conda: None
pytest: None
IPython: 5.8.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions