[ntuple] Reduce memory usage of RPageSinkBuf
#20425
Open
+23
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When IMT is turned on and
RPageSinkBufhas anRTaskScheduler, we would previously buffer all pages and create tasks to seal / compress them. While this exposes the maximum work, it's a waste of memory if other threads are not fast enough to process the tasks. Heuristically assume that there is enough work if we already buffer more uncompressed bytes than the approximate zipped cluster size.In a small test, writing random data with
ROOT::EnableImplicitMT(1)and therefore no extra worker thread, the application used 500 MB before this change for the default cluster size of 128 MiB. After this change, memory usage is reduced to around 430 MB (compared to a memory usage of 360 MB without IMT). The compression factor is around ~2.1x in this case, which roughly checks out:Instead of buffering the full uncompressed cluster (which is around compression factor * zipped cluster size = 270 MiB), we now buffer uncompressed pages up to the approximate zipped cluster size (128 MiB) and then start compressing pages immediately. The result of course also needs to be buffered, but is much smaller after compression: ((1 - 1 / compression factor) * zipped cluster size = 67 MiB). Accordingly, the gain will be higher for larger compression factors.
Closes #18314
FYI @Dr15Jones @makortel as discussed previously