perf: Move heavy object deallocation off the critical Execution path #3241

alin-at-dfinity · 2024-12-18T22:47:10Z

Reuse State Manager's deallocator thread in Execution, moving up to half the work under heavy load onto a background thread.

alin-at-dfinity · 2024-12-18T22:53:10Z

Before and after CPU profiles, while running a best-effort message benchmark. Note how the big chunk of PageMap drop moves from the scheduler threads to a new ExecutionDeallocator thread.

adambratschikaye

Nice, thanks!

adambratschikaye · 2024-12-19T07:59:49Z

rs/execution_environment/src/execution_environment.rs

@@ -418,6 +421,7 @@ impl ExecutionEnvironment {
            Arc::clone(&ingress_history_writer),
            fd_factory,
        );
+        let deallocator_thread = DeallocatorThread::new("ExecutionDeallocator", 10000);


How did you decide on 10k?

You hit the nail on the head. (o:

The State Manager deallocator thread would originally sleep for 1 ms between successive drops. So I used 1000 (as in 1 s / 1000) as the factor there, to preserve the behavior.

I initially started with 1000 here too, but while running my (very heavy) benchmarks I noticed that half the deallocations still happened on the scheduler threads. So I just bumped it by 10x because I wanted to get some results quickly and one benchmark run would take on the order of 15 minutes.

On the one hand, I did not immediately notice any negative effects from sleeping 100 µs between drops. And we are likely to both benefit from and need to deallocate lots of large objects under extremely heavy load. Under normal load, the 10k will neither help, nor have any effect.

That being said, I can try to fine tune it. It's quite likely that we can reduce the factor (and increase the interval) by at least 2x without negative effects on throughput. (The peak message throughput I saw in my testing was about 2.5k messages/second, so assuming 2 objects to deallocate per message, we could likely do with 5k objects sent to the deallocator thread per second, At 1 object per message, 2.5k would do.)

I've done a couple of runs, with 3k and 5k. In both cases, some SystemStates and ExecutionStates are being dropped synchronously, because the channel to the deallocator thread becomes backlogged. With 3k (333µs sleep), it's about 20%; with 5k (200 µs sleep) it's about 13%.

FWIW, I've never noticed any synchronous drops with 10k (100 µs sleep). But then the overhead to execution (12% and respectively 9%; only under very heavy load) is not really meaningful with either 3k or 5k.

It needs some comment explaining why 10000 in the code and when to readjust it

I will add a doc comment, thanks for the suggestion.

adambratschikaye · 2024-12-19T08:24:25Z

What's the benchmark that the flamegraphs are coming from?

alin-at-dfinity · 2024-12-19T09:35:16Z

What's the benchmark that the flamegraphs are coming from?

The benchmark is simply a run of ict test //rs/tests/message_routing/xnet:xnet_slo_3_subnets_test from a branch combining #2995 (tweaking the replica and test to run with best-effort messages) and #3153 (various performance improvements, including this one). The test will fail, but the point is to produce enough load to test the callback and best-effort message memory limits.

rs/utils/thread/src/deallocator_thread.rs

schneiderstefan · 2024-12-19T11:24:27Z

rs/execution_environment/src/execution_environment.rs

@@ -345,6 +347,7 @@ pub struct ExecutionEnvironment {
    // parallel and potentially reserving resources. It should be initialized to
    // the number of scheduler cores.
    resource_saturation_scaling: usize,
+    deallocator_thread: DeallocatorThread,


As discussed offline, it would be better to flush this when dropping an ExecutionEnvironment during tests.

Should I just impl Drop for ExecutionEnvironment or do you have something more specific in mind?

Yes exactly. You can see the same pattern in the state manager

The test failures on the PR also seem to suggest that flushing the thread on dropping ExecutionEnvironment for tests could be a good idea (note that I didn't look into the failures in detail, so it might well be something else; just saying this sounds like a plausible cause).

rs/utils/thread/src/deallocator_thread.rs

derlerd-dfinity · 2024-12-20T08:27:12Z

rs/execution_environment/src/execution_environment.rs

@@ -345,6 +347,7 @@ pub struct ExecutionEnvironment {
    // parallel and potentially reserving resources. It should be initialized to
    // the number of scheduler cores.
    resource_saturation_scaling: usize,
+    deallocator_thread: DeallocatorThread,


The test failures on the PR also seem to suggest that flushing the thread on dropping ExecutionEnvironment for tests could be a good idea (note that I didn't look into the failures in detail, so it might well be something else; just saying this sounds like a plausible cause).

derlerd-dfinity · 2024-12-20T08:27:49Z

rs/utils/thread/src/deallocator_thread.rs

+
+/// A thread that deallocates complex objects in the background. It spreads the
+/// cost of deallocation over a longer period of time, to avoid long pauses.
+pub struct DeallocatorThread {


I really like that this is encapsulated now! One question that crossed my mind: do we also need some tests?

pakhomov-dfinity · 2024-12-20T09:39:19Z

rs/execution_environment/src/execution_environment.rs

@@ -418,6 +421,7 @@ impl ExecutionEnvironment {
            Arc::clone(&ingress_history_writer),
            fd_factory,
        );
+        let deallocator_thread = DeallocatorThread::new("ExecutionDeallocator", 10000);


It needs some comment explaining why 10000 in the code and when to readjust it

pakhomov-dfinity · 2024-12-20T09:41:16Z

rs/state_manager/src/lib.rs

-                })
-                .expect("failed to spawn background deallocation thread"),
-        );
+        let deallocator_thread = DeallocatorThread::new("StateDeallocation", 1000);


Should we pass a duration so that the numbers are less cryptic?

perf: Move heavy object deallocation off the critical Execution path

0724a50

Reuse State Manager's deallocator thread in Execution, moving up to half the work under heavy load onto a background thread.

github-actions bot added the perf label Dec 18, 2024

alin-at-dfinity added 2 commits December 18, 2024 22:54

Fix renaming.

b1aca6c

Make clippy happy.

286342b

alin-at-dfinity marked this pull request as ready for review December 18, 2024 23:02

alin-at-dfinity requested review from a team as code owners December 18, 2024 23:02

github-actions bot added @ic-interface-owners @ic-message-routing-owners @execution labels Dec 18, 2024

adambratschikaye approved these changes Dec 19, 2024

View reviewed changes

alin-at-dfinity requested a review from adambratschikaye December 19, 2024 11:07

schneiderstefan reviewed Dec 19, 2024

View reviewed changes

rs/utils/thread/src/deallocator_thread.rs Show resolved Hide resolved

Make Message Routing owners of /rs/utils/thread/.

8a9e7a5

alin-at-dfinity requested a review from a team as a code owner December 19, 2024 11:45

alin-at-dfinity requested a review from schneiderstefan December 19, 2024 11:45

github-actions bot added the @ic-owners-owners label Dec 19, 2024

basvandijk approved these changes Dec 19, 2024

View reviewed changes

derlerd-dfinity reviewed Dec 20, 2024

View reviewed changes

pakhomov-dfinity approved these changes Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Move heavy object deallocation off the critical Execution path #3241

perf: Move heavy object deallocation off the critical Execution path #3241

alin-at-dfinity commented Dec 18, 2024

alin-at-dfinity commented Dec 18, 2024

adambratschikaye left a comment

adambratschikaye Dec 19, 2024

alin-at-dfinity Dec 19, 2024 •

edited

Loading

alin-at-dfinity Dec 19, 2024

pakhomov-dfinity Dec 20, 2024

alin-at-dfinity Dec 20, 2024

adambratschikaye commented Dec 19, 2024

alin-at-dfinity commented Dec 19, 2024

schneiderstefan Dec 19, 2024

alin-at-dfinity Dec 19, 2024

schneiderstefan Dec 19, 2024

derlerd-dfinity Dec 20, 2024 •

edited

Loading

derlerd-dfinity Dec 20, 2024 •

edited

Loading

derlerd-dfinity Dec 20, 2024

pakhomov-dfinity Dec 20, 2024

pakhomov-dfinity Dec 20, 2024

perf: Move heavy object deallocation off the critical Execution path #3241

Are you sure you want to change the base?

perf: Move heavy object deallocation off the critical Execution path #3241

Conversation

alin-at-dfinity commented Dec 18, 2024

alin-at-dfinity commented Dec 18, 2024

adambratschikaye left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alin-at-dfinity Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adambratschikaye commented Dec 19, 2024

alin-at-dfinity commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derlerd-dfinity Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

derlerd-dfinity Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alin-at-dfinity Dec 19, 2024 •

edited

Loading

derlerd-dfinity Dec 20, 2024 •

edited

Loading

derlerd-dfinity Dec 20, 2024 •

edited

Loading