-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncontrolled memory growth in custom backend #9576
Comments
That does look like a well-formed issue, thanks @sappjw . I had a look through the code for 10 minutes as I'm not familiar with it. It looks like we maintain a To the extent you want to explore more — it would be interesting to see whether that is incrementing-and-not-decrementing as you open the file on each loop. And then why it's not decrementing — does the file need to have |
Thanks. Inserting a |
OK. Would be interesting to see |
If I put a |
Very possibly, but I'm unfortunately a long way from the expert on this code, hence my rather basic debugging so far. Others will hopefully know more. If that suggestion helps and you can put a small PR + test together, that will very likely get some good traction. |
What happened?
I wrote a custom backend. I'm using it to open a file, operate on the data, remove most of it from the Dataset using
.isel
, open the next, concatenate, and repeat. I noticed the memory used by the system grew significantly over time even though the size of theDataset
was approximately the same. I was able to reproduce the problem without most of this complexity.I repeatedly created a dummy
Dataset
with 25Variable
s and observed the number of objects with objgraph after each object creation. I seeVariable
instances continually increasing, even though I havedel
'd theDataset
after creating it. I think this suggests that something inxarray
is not releasing theDataset
.I picked a random
Variable
that was not released and printed the reference chain graph.What did you expect to happen?
I expected the memory used for the Dataset to be released and garbage-collected. I expected the memory in use to plateau instead of grow.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
This crashes the Binder notebook instance since it uses so much memory.
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.0-505.el9.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.9.0
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: 1.7.1
pydap: None
h5netcdf: 1.2.0
h5py: 3.11.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.9.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None
The text was updated successfully, but these errors were encountered: