-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Using modkit sample-probs in modkit v0.6.1 in the following (path-obfuscated) command line:
modkit sample-probs \
--threads=2 \
--seed=42 \
--no-sampling \
--force \
--log-filepath=/path/to/data/dir/modkit_log.txt \
--out-dir=/path/to/data/dir \
/path/to/data/dir/mapped.bam
results in modkit allocating the entire available RAM.
For any run attempt on our HPC cluster, varying the thread count and memory limit, modkit got killed every time soon after starting, with the error message of exceeding the allocated memory. I then moved the dataset to a separate server of ours with better profiling possibilities and surveyed the run. We have 1.4 TB RAM on that machine, and within roughly one and a half hours, modkit occupied it entirely, steadily increasing the allocations, without freeing any. Another ca. half an hour later of maximum RAM usage, the OS (I suppose) finally killed the job, yielding the exit code 137. The logfile didn't get written after the initial few lines included below, and no other output file is created. Given that the mapped modBAM file is just 72 GB large, it could fit 19 times in the RAM, so even if modkit read it in its entirety for both threads in separate, there would still be plenty of free memory. In theory, I thought, the Rust compiler would prohibit such apparent memory leaks, but I can't find another explanation for the observations.
Full log
[modkit-logging/src/lib.rs::69][2026-03-11 09:13:50][DEBUG] command line: /opt/modkit/modkit sample-probs --threads=2 --seed=42 --no-sampling --force --log-filepath=/path/to/data/dir/modkit_log.txt --out-dir=/path/to/data/dir /path/to/data/dir/mapped.bam
[modkit-core/src/modbam_util/subcommands.rs::1964][2026-03-11 09:13:50][INFO] not subsampling, using all reads
[modkit-core/src/reads_sampler/mod.rs::49][2026-03-11 09:13:50][DEBUG] found BAM index, sampling reads in 1000000 base pair chunks
[modkit-core/src/reads_sampler/sampling_schedule.rs::141][2026-03-11 09:13:50][DEBUG] derived sampling schedule, sampling total 32081530 reads from 61 contigs, including unmapped reads
[modkit-core/src/reads_sampler/sampling_schedule.rs::164][2026-03-11 09:13:50][DEBUG] schedule
chrom count/frac
0 all
55 all
49 all
43 all
37 all
31 all
25 all
19 all
13 all
7 all
1 all
56 all
50 all
44 all
38 all
32 all
26 all
20 all
14 all
8 all
2 all
57 all
51 all
45 all
39 all
33 all
27 all
21 all
15 all
9 all
3 all
58 all
52 all
46 all
40 all
34 all
28 all
22 all
16 all
10 all
4 all
59 all
53 all
47 all
41 all
35 all
29 all
23 all
17 all
11 all
5 all
60 all
54 all
48 all
42 all
36 all
30 all
24 all
18 all
12 all
6 all
[modkit-core/src/reads_sampler/sampling_schedule.rs::167][2026-03-11 09:13:50][DEBUG] and all unmapped reads
[modkit-core/src/interval_chunks.rs::225][2026-03-11 09:13:50][DEBUG] there are 61 contig(s) to work on (61 parts)
In case it helps troubleshooting: I installed modkit as a Singularity image, built from a Docker container with the below Dockerfile. You can fetch the built container in Docker Hub.
Dockerfile
FROM ubuntu:24.04
# Install the dependencies.
RUN apt-get update \
&& apt-get upgrade --yes \
&& apt-get install --yes \
wget \
&& rm --force --recursive /var/lib/apt/lists/*
# Set the environment variables.
ENV MODKIT_VERSION=0.6.1
# Download the pre-compiled modkit binary.
WORKDIR /opt
RUN wget https://github.com/nanoporetech/modkit/releases/download/v${MODKIT_VERSION}/modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz \
&& tar --extract --gzip --file=modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz \
&& mv dist_modkit_v${MODKIT_VERSION}* modkit \
&& rm --force --recursive modkit_v${MODKIT_VERSION}_u16_x86_64.tar.gz
WORKDIR /opt/modkit
# Set the PATH.
ENV PATH="/opt/modkit:${PATH}"
# Add a non-root user.
RUN useradd --no-create-home --user-group --uid=1001 user
USER user