-
Notifications
You must be signed in to change notification settings - Fork 296
Iris save taking very long (hanging) with some netcdf files #3362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An ESMValTool core development member suggested tagging @bjlittle on this , not sure if this does it. |
@rswamina Thanks for taking the time to raise this issue, much appreciated. I'll see if I can recreate the issue on my side and investigate further 👍 |
cheers @bjlittle man - one mention is that I tried with both |
@rswamina and @valeriupredoi This is pretty interesting, and also very relevant at the moment, see issue #3357 The behaviour that you're witnessing is the direct result of bad/unfavourable chunking at the file-format level of the associated netCDF variables. The netCDF chunking of each data variable in the provided files (thanks @rswamina for providing the data btw 👍) is as follows: >>> import netCDF4 as nc
>>> for i in range(3):
... fname = 'GPP.ANN.CRUNCEPv6.monthly.198{}.nc'.format(i)
... ds = nc.Dataset(fname)
... var = ds.variables['GPP']
... print(fname)
... print(var)
... print('chunking = {}\n'.format(var.chunking()))
...
GPP.ANN.CRUNCEPv6.monthly.1980.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
long_name: GPP
scale_factor: 1.0
add_offset: 0.0
missing_value: -9999.0
_FillValue: -9999.0
units: gC m-2 day-1
unlimited dimensions:
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10]
GPP.ANN.CRUNCEPv6.monthly.1981.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
long_name: GPP
scale_factor: 1.0
add_offset: 0.0
missing_value: -9999.0
_FillValue: -9999.0
units: gC m-2 day-1
unlimited dimensions:
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10]
GPP.ANN.CRUNCEPv6.monthly.1982.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
long_name: GPP
scale_factor: 1.0
add_offset: 0.0
missing_value: -9999.0
_FillValue: -9999.0
units: gC m-2 day-1
unlimited dimensions:
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10] As you see, each
More importantly, at the moment in Note that, each netCDF variable is a from timeit import timeit
>>> timeit('var[:]', number=3, globals=dict(var=var))
1.6068386339993594
>>> timeit('var[:]', number=1, globals=dict(var=var))
0.5159940029998324 This seems quite reasonable. Now let's try this with >>> import dask.array as da
>>> dvar = da.from_array(var, chunks=var.shape)
>>> timeit('dvar.compute()', number=3, globals=dict(dvar=dvar))
1.6165974420000566
>>> timeit('dvar.compute()', number=1, globals=dict(dvar=dvar))
0.543516056000044 This also seems pretty reasonable to me too. Now, let's do the same again, but this time set the >>> dvar = da.from_array(var, chunks=var.chunking())
>>> dvar.compute()
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/base.py", line 156, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/base.py", line 398, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/threaded.py", line 76, in get
pack_exception=pack_exception, **kwargs)
File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/local.py", line 453, in get_async
key, res_info, failed = queue_get(queue)
File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/local.py", line 141, in queue_get
return q.get()
File "/home/bill/miniconda3/envs/iris/lib/python3.7/queue.py", line 170, in get
self.not_empty.wait()
File "/home/bill/miniconda3/envs/iris/lib/python3.7/threading.py", line 296, in wait
waiter.acquire()
KeyboardInterrupt Yeah, well, I pretty much gave up... it hung, as I expected, and as you've been witnessing 😢 |
So in a nutshell, the issue (to me) is directly related to the For this data and chunking, you hit the issue at the very point you attempt to realise the data i.e., in this case when you attempt to save the cube. This issue is related to #3333 #3357 and will hopefully be resolved by #3361 |
Hi, @bjlittle - Thanks for looking into this. Is there any other information I can provide towards debugging? Just that I am working towards a deadline and need the file saved after processing. Do let me know. |
Ha...just noticed my page did not get refreshed...will check the messages on the thread and your responses...Thanks! |
If I understand this correctly, then the change is made to Iris v2.3.0? Does that mean I should update my version of iris with ESMValtool (from 2.2.1 which is what it is currently to 2.3.0) for the fix to work? Sorry I am not very familiar with how these changes propagate. |
@rswamina We've yet to incorporate the change that fixes your issue into iris. Even so, we're not due to release So it seems that you're going to have to workaround this issue in the meantime, sorry. You could do this in a couple of ways, either:
Does this help? |
Right. Let me try the rechunking. This is particularly a problem when I try concatenating all the data into one cube. I will report back with problems/status. |
@rswamina Are you using conda to install ESMValTool? If so, then I could talk you through installing the PR branch of |
I am installing ESMValTool from source. This is because I am adding code for new data sets. Is it still possible to install the PR branch of iris? Rechunking seems to take considerable time. I followed these instructions https://github.com/ESMValGroup/ESMValTool/blob/version2_development/CONTRIBUTING.md for installation from source. Is there any way to install and use a different branch of iris in my local environment for now? |
@rswamina I also tried using the So if you've got ESMValTool in a conda environment, simply do the following:
> mkdir iris-dev
> cd iris-dev
> git clone [email protected]:pp-mo/iris.git
> cd iris
> git fetch origin
> git checkout chunk_control
> pip install -e . Or you could simply try > pip install git+https://github.com/pp-mo/iris.git@chunk_control I'm writing this via my mobile, so I can't test what I'm suggesting is perfectly correct. Once installed, fire-up > python
>>> import iris
>>> iris.__version__ and this should give you the development version of Give that a try... |
@bjlittle - Quick update: I figured out an optimum size to rechunk (12,180,360) and that seemed to do it. Chunking time was short and processing was fine. I will try out the iris cloning option and update this thread. |
Yup, dask Rechunking would also work 👍 |
Re-chunking the files will work, but I don't think dask re-chunking will ?? |
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity. If this issue is still important to you, then please comment on this issue and the stale label will be removed. Otherwise this issue will be automatically closed in 28 days time. |
This stale issue has been automatically closed due to a lack of community activity. If you still care about this issue, then please either:
|
I am trying to read in a few netcdf files, make a simple change to the data and concatenate them using the concatenate_cube function. It looks like iris takes a long time or hangs when saving this data/ accessing this data. I am using this data for some development work using the ESMValTool (https://github.com/ESMValGroup/ESMValTool) and the ultimate goal is to make the datset CMOR compliant. The version of Python being used is 3.7.3 and Iris is 2.2.1. Sample files are at
https://drive.google.com/drive/folders/1ZZQ6jRP3MsY4WPku-dj9wc2H1jKrNOZF?usp=sharing
A suggestion made by @valeriupredoi a core developer with ESMValTool is:
--
it fails to realize the lazy data - for a smaller array it's still crazy slow:
time c[0, :, :].data
CPU times: user 16.4 s, sys: 6.13 s, total: 22.6 s
Wall time: 23.1 s
where c is the cube from one of the files only (the 0th)
--
Would it be possible to identify and fix this issue?
The text was updated successfully, but these errors were encountered: