Skip to content

Possible memory leak by modkit sample-probs #592

@Simon-Brandt

Description

@Simon-Brandt

Using modkit sample-probs in modkit v0.6.1 in the following (path-obfuscated) command line:

modkit sample-probs \
    --threads=2 \
    --seed=42 \
    --no-sampling \
    --force \
    --log-filepath=/path/to/data/dir/modkit_log.txt \
    --out-dir=/path/to/data/dir \
    /path/to/data/dir/mapped.bam

results in modkit allocating the entire available RAM.

For any run attempt on our HPC cluster, varying the thread count and memory limit, modkit got killed every time soon after starting, with the error message of exceeding the allocated memory. I then moved the dataset to a separate server of ours with better profiling possibilities and surveyed the run. We have 1.4 TB RAM on that machine, and within roughly one and a half hours, modkit occupied it entirely, steadily increasing the allocations, without freeing any. Another ca. half an hour later of maximum RAM usage, the OS (I suppose) finally killed the job, yielding the exit code 137. The logfile didn't get written after the initial few lines included below, and no other output file is created. Given that the mapped modBAM file is just 72 GB large, it could fit 19 times in the RAM, so even if modkit read it in its entirety for both threads in separate, there would still be plenty of free memory. In theory, I thought, the Rust compiler would prohibit such apparent memory leaks, but I can't find another explanation for the observations.

Full log
[modkit-logging/src/lib.rs::69][2026-03-11 09:13:50][DEBUG] command line: /opt/modkit/modkit sample-probs --threads=2 --seed=42 --no-sampling --force --log-filepath=/path/to/data/dir/modkit_log.txt --out-dir=/path/to/data/dir /path/to/data/dir/mapped.bam
[modkit-core/src/modbam_util/subcommands.rs::1964][2026-03-11 09:13:50][INFO] not subsampling, using all reads
[modkit-core/src/reads_sampler/mod.rs::49][2026-03-11 09:13:50][DEBUG] found BAM index, sampling reads in 1000000 base pair chunks
[modkit-core/src/reads_sampler/sampling_schedule.rs::141][2026-03-11 09:13:50][DEBUG] derived sampling schedule, sampling total 32081530 reads from 61 contigs, including unmapped reads
[modkit-core/src/reads_sampler/sampling_schedule.rs::164][2026-03-11 09:13:50][DEBUG] schedule
 chrom  count/frac 
 0      all 
 55     all 
 49     all 
 43     all 
 37     all 
 31     all 
 25     all 
 19     all 
 13     all 
 7      all 
 1      all 
 56     all 
 50     all 
 44     all 
 38     all 
 32     all 
 26     all 
 20     all 
 14     all 
 8      all 
 2      all 
 57     all 
 51     all 
 45     all 
 39     all 
 33     all 
 27     all 
 21     all 
 15     all 
 9      all 
 3      all 
 58     all 
 52     all 
 46     all 
 40     all 
 34     all 
 28     all 
 22     all 
 16     all 
 10     all 
 4      all 
 59     all 
 53     all 
 47     all 
 41     all 
 35     all 
 29     all 
 23     all 
 17     all 
 11     all 
 5      all 
 60     all 
 54     all 
 48     all 
 42     all 
 36     all 
 30     all 
 24     all 
 18     all 
 12     all 
 6      all 

[modkit-core/src/reads_sampler/sampling_schedule.rs::167][2026-03-11 09:13:50][DEBUG] and all unmapped reads
[modkit-core/src/interval_chunks.rs::225][2026-03-11 09:13:50][DEBUG] there are 61 contig(s) to work on (61 parts)

In case it helps troubleshooting: I installed modkit as a Singularity image, built from a Docker container with the below Dockerfile. You can fetch the built container in Docker Hub.

Dockerfile
FROM ubuntu:24.04

# Install the dependencies.
RUN apt-get update \
    && apt-get upgrade --yes \
    && apt-get install --yes \
        wget \
    && rm --force --recursive /var/lib/apt/lists/*

# Set the environment variables.
ENV MODKIT_VERSION=0.6.1

# Download the pre-compiled modkit binary.
WORKDIR /opt
RUN wget https://github.com/nanoporetech/modkit/releases/download/v${MODKIT_VERSION}/modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz \
    && tar --extract --gzip --file=modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz \
    && mv dist_modkit_v${MODKIT_VERSION}* modkit \
    && rm --force --recursive modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz

WORKDIR /opt/modkit

# Set the PATH.
ENV PATH="/opt/modkit:${PATH}"

# Add a non-root user.
RUN useradd --no-create-home --user-group --uid=1001 user
USER user

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions