Skip to content

Uncontrolled memory growth in custom backend #9576

Open
@sappjw

Description

@sappjw

What happened?

I wrote a custom backend. I'm using it to open a file, operate on the data, remove most of it from the Dataset using .isel, open the next, concatenate, and repeat. I noticed the memory used by the system grew significantly over time even though the size of the Dataset was approximately the same. I was able to reproduce the problem without most of this complexity.

I repeatedly created a dummy Dataset with 25 Variables and observed the number of objects with objgraph after each object creation. I see Variable instances continually increasing, even though I have del'd the Dataset after creating it. I think this suggests that something in xarray is not releasing the Dataset.

Variable               75       +25
ReferenceType        8511        +4
lock                   46        +4
dict                21283        +2
KeyedRef               22        +2
SerializableLock        9        +2
list                23256        +1
set                  1873        +1
method               1893        +1
SeedSequence            4        +1
---
Variable              100       +25
ReferenceType        8515        +4
lock                   50        +4
dict                21285        +2
KeyedRef               24        +2
SerializableLock       11        +2
list                23257        +1
set                  1874        +1
method               1894        +1
SeedSequence            5        +1

I picked a random Variable that was not released and printed the reference chain graph.

import random
objgraph.show_chain(objgraph.find_backref_chain(random.choice(objgraph.by_type('Variable')), objgraph.is_proper_module), filename='graph.png')

graph_variable

What did you expect to happen?

I expected the memory used for the Dataset to be released and garbage-collected. I expected the memory in use to plateau instead of grow.

Minimal Complete Verifiable Example

import os
os.system("mamba install --yes objgraph")

import io

import numpy as np
import xarray as xr
from xarray.backends import BackendArray, BackendEntrypoint
from xarray.backends.common import AbstractDataStore
from xarray.backends.file_manager import CachingFileManager
from xarray.backends.locks import SerializableLock, ensure_lock
from xarray.backends.store import StoreBackendEntrypoint
from xarray.core.indexing import (
    IndexingSupport,
    explicit_indexing_adapter,
)
from xarray.core.utils import Frozen, close_on_error


# This backend creates a random 500MB variable and stores it into a Dataset 25 times
class memtest_DataStore(AbstractDataStore):
    """Store for binary data"""

    def __init__(
        self,
        filename,
        lock=None,
        num_bytes=500_000_000,
        **kwargs,
    ):
        if lock is None:
            lock = SerializableLock()
        self.lock = ensure_lock(lock)
        # Create a lock for the file manager
        self._fm_lock = SerializableLock()
        self._manager = CachingFileManager(
            self._open_file,
            filename,
            lock=self._fm_lock,
            kwargs=kwargs,
        )

        self.num_bytes = num_bytes
        self._num_vars = 25
        self._rng = np.random.default_rng()
        self._xr_obj = None

    def _open_file(self, filename, **kwargs):
        b = self._rng.random(self.num_bytes // 8, dtype=np.float64)
        file_obj = io.BytesIO(b, **kwargs)

        return file_obj

    def _create_dataset(self):
        with self._manager.acquire_context(needs_lock=True) as file_obj:
            file_obj.seek(0)
            fltarr = np.frombuffer(file_obj.getvalue(), np.float64)

        xr_obj = xr.Dataset()
        # Create coordinates
        xr_obj["index"] = (
            ("index",),
            range(len(fltarr)),
        )
        # Create several variables so we can see the change
        for i in range(self._num_vars):
            xr_obj[f"var{i}"] = (
                ("index",),
                fltarr,
            )

        return xr_obj

    @property
    def ds(self):
        if not self._xr_obj:
            self._xr_obj = self._create_dataset()

        return self._xr_obj

    def open_store_variable(self, name, var):
        return self.ds[name].variable

    def get_variables(self):
        return Frozen(self.ds.variables)

    def get_attrs(self):
        return Frozen(self.ds.attrs)

    def get_dimensions(self):
        return Frozen(self.ds.dims)

    def close(self):
        self._manager.close()


class memtest_backend(BackendEntrypoint):
    description = "Memory test backend"

    def open_dataset(
        self,
        filename_or_obj,
        *,
        drop_variables=None,
        lock=None,
        num_bytes=500_000_000,
    ):
        store = memtest_DataStore(
            filename_or_obj,
            lock=lock,
            num_bytes=num_bytes,
        )

        store_entrypoint = StoreBackendEntrypoint()
        with close_on_error(store):
            ds = store_entrypoint.open_dataset(
                store,
                drop_variables=drop_variables,
            )

        return ds

# Loop to test memory growth
# This results in about 10GB of additional memory used by the end of the process
import xarray as xr
import objgraph

objgraph.show_growth(limit=1)
for i in range(10):
    ds = xr.open_dataset(i, engine=memtest_backend)
    # Just to test that something's still holding on to the data
    del ds
    objgraph.show_growth()
    print('---')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

function    18610    +18610
list                 15403    +12798
function             25617     +7007
dict                 13365     +6005
tuple                13360     +5284
CompositeUnit         2954     +2954
PrefixUnit            1526     +1526
ReferenceType         4284     +1181
cell                  4966     +1112
type                  2397      +792
getset_descriptor     3374      +628
---
Variable                         50       +25
ReferenceType                  4298       +14
builtin_function_or_method     2537       +10
lock                             34        +4
dict                          13367        +2
KeyedRef                         19        +2
SerializableLock                  7        +2
list                          15404        +1
set                             952        +1
method                          369        +1
---
Variable               75       +25
ReferenceType        4302        +4
lock                   38        +4
dict                13369        +2
KeyedRef               21        +2
SerializableLock        9        +2
list                15405        +1
set                   953        +1
method                370        +1
SeedSequence            4        +1
---
Variable              100       +25
ReferenceType        4306        +4
lock                   42        +4
dict                13371        +2
KeyedRef               23        +2
SerializableLock       11        +2
list                15406        +1
set                   954        +1
method                371        +1
SeedSequence            5        +1
---
Variable              125       +25
ReferenceType        4310        +4
lock                   46        +4
dict                13373        +2
KeyedRef               25        +2
SerializableLock       13        +2
list                15407        +1
set                   955        +1
method                372        +1
SeedSequence            6        +1
---
Variable              150       +25
ReferenceType        4314        +4
lock                   50        +4
dict                13375        +2
KeyedRef               27        +2
SerializableLock       15        +2
list                15408        +1
set                   956        +1
method                373        +1
SeedSequence            7        +1
---
Variable              175       +25
ReferenceType        4318        +4
lock                   54        +4
dict                13377        +2
KeyedRef               29        +2
SerializableLock       17        +2
list                15409        +1
set                   957        +1
method                374        +1
SeedSequence            8        +1
---
Variable              200       +25
ReferenceType        4322        +4
lock                   58        +4
dict                13379        +2
KeyedRef               31        +2
SerializableLock       19        +2
list                15410        +1
set                   958        +1
method                375        +1
SeedSequence            9        +1
---
Variable              225       +25
ReferenceType        4326        +4
lock                   62        +4
dict                13381        +2
KeyedRef               33        +2
SerializableLock       21        +2
list                15411        +1
set                   959        +1
method                376        +1
SeedSequence           10        +1
---
Variable              250       +25
ReferenceType        4330        +4
lock                   66        +4
dict                13383        +2
KeyedRef               35        +2
SerializableLock       23        +2
list                15412        +1
set                   960        +1
method                377        +1
SeedSequence           11        +1
---

Anything else we need to know?

This crashes the Binder notebook instance since it uses so much memory.

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 18:08:52) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.0-505.el9.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.9.0
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: 1.7.1
pydap: None
h5netcdf: 1.2.0
h5py: 3.11.0
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.9.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.9.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.16.1
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions