Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Nov 14, 2025

When IMT is turned on and RPageSinkBuf has an RTaskScheduler, we would previously buffer all pages and create tasks to seal / compress them. While this exposes the maximum work, it's a waste of memory if other threads are not fast enough to process the tasks. Heuristically assume that there is enough work if we already buffer more uncompressed bytes than the approximate zipped cluster size.

In a small test, writing random data with ROOT::EnableImplicitMT(1) and therefore no extra worker thread, the application used 500 MB before this change for the default cluster size of 128 MiB. After this change, memory usage is reduced to around 430 MB (compared to a memory usage of 360 MB without IMT). The compression factor is around ~2.1x in this case, which roughly checks out:
Instead of buffering the full uncompressed cluster (which is around compression factor * zipped cluster size = 270 MiB), we now buffer uncompressed pages up to the approximate zipped cluster size (128 MiB) and then start compressing pages immediately. The result of course also needs to be buffered, but is much smaller after compression: ((1 - 1 / compression factor) * zipped cluster size = 67 MiB). Accordingly, the gain will be higher for larger compression factors.

Closes #18314

FYI @Dr15Jones @makortel as discussed previously

Created tasks reference *this, so moving is not safe. It's also not
needed because RPageSinkBuf is always inside a std::unique_ptr.
When IMT is turned on and RPageSinkBuf has an RTaskScheduler, we
would previously buffer all pages and create tasks to seal / compress
them. While this exposes the maximum work, it's a waste of memory
if other threads are not fast enough to process the tasks.
Heuristically assume that there is enough work if we already buffer
more uncompressed bytes than the approximate zipped cluster size.

In a small test, writing random data with ROOT::EnableImplicitMT(1)
and therefore no extra worker thread, the application used 500 MB
before this change for the default cluster size of 128 MiB. After
this change, memory usage is reduced to around 430 MB (compared to
a memory usage of 360 MB without IMT). The compression factor is
around ~2.1x in this case, which roughly checks out:
Instead of buffering the full uncompressed cluster (which is around
compression factor * zipped cluster size = 270 MiB), we now buffer
uncompressed pages up to the approximate zipped cluster size (128 MiB)
and then start compressing pages immediately. The result of course
also needs to be buffered, but is much smaller after compression:
((1 - 1 / compression factor) * zipped cluster size = 67 MiB).
Accordingly, the gain will be higher for larger compression factors.
@github-actions
Copy link

Test Results

    22 files      22 suites   3d 14h 15m 11s ⏱️
 3 776 tests  3 776 ✅ 0 💤 0 ❌
81 133 runs  81 133 ✅ 0 💤 0 ❌

Results for commit e896cd5.

@makortel
Copy link

Thanks @hahnjo! Just thinking out loud how we would eventually test the impact of this PR, is this something you'd be comfortable in backporting to 6.36, or would you prefer to keep it in master only? (just to be clear, at this point I'm not asking to backport)

@hahnjo
Copy link
Member Author

hahnjo commented Nov 15, 2025

Thanks @hahnjo! Just thinking out loud how we would eventually test the impact of this PR, is this something you'd be comfortable in backporting to 6.36, or would you prefer to keep it in master only? (just to be clear, at this point I'm not asking to backport)

Hi @makortel, in the past I rebuilt ROOT locally with the same configuration as in the IB (in particular CMAKE_CXX_STANDARD=20) and only some modifications applied on top. Then LD_PRELOAD the RNTuple library libROOTNTuple.so, which works as long as there is no ABI change. We briefly discussed backporting to v6-36-00-patches and I believe in general we would be open, provided it's tested in CMSSW and Athena (via dev3 LCG nightlies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ntuple] Heuristically reduce memory usage of buffered writing

3 participants