Iris save taking very long (hanging) with some netcdf files #3362

rswamina · 2019-07-29T17:22:29Z

I am trying to read in a few netcdf files, make a simple change to the data and concatenate them using the concatenate_cube function. It looks like iris takes a long time or hangs when saving this data/ accessing this data. I am using this data for some development work using the ESMValTool (https://github.com/ESMValGroup/ESMValTool) and the ultimate goal is to make the datset CMOR compliant. The version of Python being used is 3.7.3 and Iris is 2.2.1. Sample files are at
https://drive.google.com/drive/folders/1ZZQ6jRP3MsY4WPku-dj9wc2H1jKrNOZF?usp=sharing
A suggestion made by @valeriupredoi a core developer with ESMValTool is:

--
it fails to realize the lazy data - for a smaller array it's still crazy slow:

time c[0, :, :].data
CPU times: user 16.4 s, sys: 6.13 s, total: 22.6 s
Wall time: 23.1 s
where c is the cube from one of the files only (the 0th)

--
Would it be possible to identify and fix this issue?

rswamina · 2019-07-29T17:28:39Z

An ESMValTool core development member suggested tagging @bjlittle on this , not sure if this does it.

bjlittle · 2019-07-29T18:01:41Z

@rswamina Thanks for taking the time to raise this issue, much appreciated.

I'll see if I can recreate the issue on my side and investigate further 👍

valeriupredoi · 2019-07-30T12:13:51Z

cheers @bjlittle man - one mention is that I tried with both dask=1.1.4 and dask=2.1.0 and I get the same weird behaviour - the cube loads fine (and all lazy oprations work) but when realizing the data simply by print(cube.data) it hangs. Accessing individual members of the data works fine but gets slower the larger the data slice to be accessed gets as @rswamina posted a code snippet above 🍺 I am going on hols today till Monday so won't be able to help much in this meantime am afraid

bjlittle · 2019-07-30T23:03:21Z

@rswamina and @valeriupredoi This is pretty interesting, and also very relevant at the moment, see issue #3357

The behaviour that you're witnessing is the direct result of bad/unfavourable chunking at the file-format level of the associated netCDF variables.

The netCDF chunking of each data variable in the provided files (thanks @rswamina for providing the data btw 👍) is as follows:

>>> import netCDF4 as nc
>>> for i in range(3):
...     fname = 'GPP.ANN.CRUNCEPv6.monthly.198{}.nc'.format(i)
...     ds = nc.Dataset(fname)
...     var = ds.variables['GPP']
...     print(fname)
...     print(var)
...     print('chunking = {}\n'.format(var.chunking()))
... 
GPP.ANN.CRUNCEPv6.monthly.1980.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
    long_name: GPP
    scale_factor: 1.0
    add_offset: 0.0
    missing_value: -9999.0
    _FillValue: -9999.0
    units: gC m-2 day-1
unlimited dimensions: 
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10]

GPP.ANN.CRUNCEPv6.monthly.1981.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
    long_name: GPP
    scale_factor: 1.0
    add_offset: 0.0
    missing_value: -9999.0
    _FillValue: -9999.0
    units: gC m-2 day-1
unlimited dimensions: 
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10]

GPP.ANN.CRUNCEPv6.monthly.1982.nc
<class 'netCDF4._netCDF4.Variable'>
float32 GPP(time, lat, lon)
    long_name: GPP
    scale_factor: 1.0
    add_offset: 0.0
    missing_value: -9999.0
    _FillValue: -9999.0
    units: gC m-2 day-1
unlimited dimensions: 
current shape = (12, 360, 720)
filling on
chunking = [1, 10, 10]

As you see, each GPP data variable has been given a netCDF chunk of [1, 10, 10], which is:

odd (to me, anyways)
very small relative to the data variable shape of (12, 360, 720)

More importantly, at the moment in iris, it will honor this netCDF data variable chunking, and instruct dask that this is the chunks size for the associated lazy data of its cube.

Note that, each netCDF variable is a float32, so 4 x 12 x 360 x 720 ~= 12MB in size, which easily fits in memory. So let's see netCDF4 load one of these variables:

from timeit import timeit
>>> timeit('var[:]', number=3, globals=dict(var=var))
1.6068386339993594
>>> timeit('var[:]', number=1, globals=dict(var=var))
0.5159940029998324

This seems quite reasonable. Now let's try this with dask, but with a sensible chunks:

>>> import dask.array as da
>>> dvar = da.from_array(var, chunks=var.shape)
>>> timeit('dvar.compute()', number=3, globals=dict(dvar=dvar))
1.6165974420000566
>>> timeit('dvar.compute()', number=1, globals=dict(dvar=dvar))
0.543516056000044

This also seems pretty reasonable to me too.

Now, let's do the same again, but this time set the chunks with the associated netCDF chunking value:

>>> dvar = da.from_array(var, chunks=var.chunking())
>>> dvar.compute()
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/threaded.py", line 76, in get
    pack_exception=pack_exception, **kwargs)
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/local.py", line 453, in get_async
    key, res_info, failed = queue_get(queue)
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/site-packages/dask/local.py", line 141, in queue_get
    return q.get()
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/queue.py", line 170, in get
    self.not_empty.wait()
  File "/home/bill/miniconda3/envs/iris/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()
KeyboardInterrupt

Yeah, well, I pretty much gave up... it hung, as I expected, and as you've been witnessing 😢

bjlittle · 2019-07-30T23:12:54Z

@rswamina and @valeriupredoi

So in a nutshell, the issue (to me) is directly related to the netcdf chunking size specified for the data variables - it's just not appropriate to dask.

For this data and chunking, you hit the issue at the very point you attempt to realise the data i.e., in this case when you attempt to save the cube. This issue is related to #3333 #3357 and will hopefully be resolved by #3361

bjlittle · 2019-07-30T23:38:19Z

Yup, just checked, #3361 does the job, nice one @pp-mo 👍

rswamina · 2019-07-31T07:13:29Z

Hi, @bjlittle - Thanks for looking into this. Is there any other information I can provide towards debugging? Just that I am working towards a deadline and need the file saved after processing. Do let me know.

rswamina · 2019-07-31T07:15:49Z

Ha...just noticed my page did not get refreshed...will check the messages on the thread and your responses...Thanks!

rswamina · 2019-07-31T07:37:27Z

If I understand this correctly, then the change is made to Iris v2.3.0? Does that mean I should update my version of iris with ESMValtool (from 2.2.1 which is what it is currently to 2.3.0) for the fix to work? Sorry I am not very familiar with how these changes propagate.

bjlittle · 2019-07-31T10:23:59Z

@rswamina We've yet to incorporate the change that fixes your issue into iris. Even so, we're not due to release iris version 2.3.0 until 30 August, see https://github.com/SciTools/iris/milestones

So it seems that you're going to have to workaround this issue in the meantime, sorry. You could do this in a couple of ways, either:

use nccopy to rechunk your input files so that the chunk size is (12, 360, 720) instead of (1, 1, 10), see NetCDF Utilities - Rechunking for faster access
install and use the development iris branch of @pp-mo from PR Chunk control #3361

Does this help?

rswamina · 2019-07-31T10:29:52Z

Right. Let me try the rechunking. This is particularly a problem when I try concatenating all the data into one cube. I will report back with problems/status.

bjlittle · 2019-07-31T10:35:27Z

@rswamina Are you using conda to install ESMValTool? If so, then I could talk you through installing the PR branch ofiris, if you're unable to rechunk using nccopy (or something similar)

rswamina · 2019-07-31T10:53:35Z

I am installing ESMValTool from source. This is because I am adding code for new data sets. Is it still possible to install the PR branch of iris? Rechunking seems to take considerable time.

I followed these instructions https://github.com/ESMValGroup/ESMValTool/blob/version2_development/CONTRIBUTING.md

for installation from source. Is there any way to install and use a different branch of iris in my local environment for now?

bjlittle · 2019-07-31T11:22:12Z

@rswamina I also tried using the nccopy approach last night, and kinda gave up after no response for 20 minutes, but I simply thought it was more of an issue with my laptop.

So if you've got ESMValTool in a conda environment, simply do the following:

activate the conda environment containing ESMValTool
clone the iris development branch somewhere e.g.

> mkdir iris-dev
> cd iris-dev
> git clone [email protected]:pp-mo/iris.git
> cd iris
> git fetch origin
> git checkout chunk_control
> pip install -e .

Or you could simply try

> pip install git+https://github.com/pp-mo/iris.git@chunk_control

I'm writing this via my mobile, so I can't test what I'm suggesting is perfectly correct.

Once installed, fire-up python to check that the development iris version is installed and being used e.g.

> python
>>> import iris
>>> iris.__version__

and this should give you the development version of iris.

Give that a try...

rswamina · 2019-07-31T14:54:09Z

@bjlittle - Quick update: I figured out an optimum size to rechunk (12,180,360) and that seemed to do it. Chunking time was short and processing was fine. I will try out the iris cloning option and update this thread.

bjlittle · 2019-07-31T16:14:57Z

Yup, dask Rechunking would also work 👍

pp-mo · 2019-08-01T09:34:43Z

dask Rechunking would also work

Re-chunking the files will work, but I don't think dask re-chunking will ??
Re-chunking some existing lazy arrays only adds another "layer" to the dask graph : the original small chunks still exist "underneath". So if they are too small, and file data fetching (not calculations) is the slow part, then dask rechunking can't fix that : You need it to take bigger bites of the file. Unfortunately, that decision can't (easily) be changed after the initial load is done. That's true for Iris, but also for Dask itself.

github-actions · 2022-01-08T00:52:54Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions · 2022-02-06T00:58:41Z

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

Re-open this issue, if you have sufficient permissions, or
Add a comment pinging @SciTools/iris-devs who will re-open on your behalf.

bjlittle self-assigned this Jul 29, 2019

bjlittle added the Type: Performance label Jul 29, 2019

bjlittle mentioned this issue Jul 30, 2019

Chunk control #3361

Merged

rswamina mentioned this issue Jul 31, 2019

Iris chunking policy for netcdf files - new fix applied ESMValGroup/ESMValTool#1226

Open

rswamina mentioned this issue Aug 6, 2019

PORTING namelist_anav13jclim to v2 ESMValGroup/ESMValTool#473

Closed

12 tasks

pp-mo added the Feature: ESMValTool label Aug 25, 2020

github-actions bot added the Stale A stale issue/pull-request label Jan 8, 2022

github-actions bot closed this as completed Feb 6, 2022

pp-mo mentioned this issue Mar 16, 2022

Load is VERY slow for a NetCDF multi-variable file #4134

Closed

Iris save taking very long (hanging) with some netcdf files #3362

Iris save taking very long (hanging) with some netcdf files #3362

Comments

rswamina commented Jul 29, 2019

rswamina commented Jul 29, 2019

Uh oh!

bjlittle commented Jul 29, 2019

Uh oh!

valeriupredoi commented Jul 30, 2019

Uh oh!

bjlittle commented Jul 30, 2019

Uh oh!

bjlittle commented Jul 30, 2019

Uh oh!

bjlittle commented Jul 30, 2019

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

bjlittle commented Jul 31, 2019

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

bjlittle commented Jul 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

bjlittle commented Jul 31, 2019

Uh oh!

rswamina commented Jul 31, 2019

Uh oh!

bjlittle commented Jul 31, 2019

Uh oh!

pp-mo commented Aug 1, 2019

Uh oh!

github-actions bot commented Jan 8, 2022

Uh oh!

github-actions bot commented Feb 6, 2022

Uh oh!

bjlittle commented Jul 31, 2019 •

edited

Loading