tokio: remove ref of scheduler in owned task #5992

wathenjiang · 2023-09-09T14:16:47Z

Motivation

I belive we can remove the Arc::clone() call by removing the scheduler handle reference in all tasks.

For those tasks are owned, we could always use the scheduler in thread local CONTEXT.

Solution

By removing ref(Arc) of scheduler in owned tasks, we could got a better performance, espically when we call tokio::spawn very frequently in multiple threads.

And we keep ref of scheduler in unowned tasks, which do not depend on the sheduler in thread local CONTEXT.

The cargo build can be more quickly by removing the generic parameter S.

Here are the benchmark results of spawn2.

The current version:

Gnuplot not found, using plotters backend
spawn_tasks_current_thread
                        time:   [673.18 ns 680.79 ns 687.52 ns]
                        change: [-6.2656% -2.7726% +1.1065%] (p = 0.15 > 0.05)
                        No change in performance detected.

spawn_tasks_current_thread_parallel
                        time:   [521.01 ns 521.75 ns 522.35 ns]
                        change: [-1.4901% -0.6725% +0.1920%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

spawn_tasks             time:   [966.28 ns 976.72 ns 985.98 ns]
                        change: [+0.1574% +1.6134% +2.9634%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild

spawn_tasks_parallel    time:   [893.21 ns 898.49 ns 903.49 ns]
                        change: [+0.3002% +1.3720% +2.4327%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

This PR version:

# cargo bench --bench spawn2
    Finished bench [optimized] target(s) in 0.07s
     Running spawn2.rs (/root/rustproject/fork_tokio/target/release/deps/spawn2-b48a6368b3e3dabc)
Gnuplot not found, using plotters backend
spawn_tasks_current_thread
                        time:   [650.39 ns 662.56 ns 672.59 ns]
                        change: [-6.8428% -3.3240% +0.2419%] (p = 0.08 > 0.05)
                        No change in performance detected.

spawn_tasks_current_thread_parallel
                        time:   [526.50 ns 527.29 ns 528.10 ns]
                        change: [-1.9903% -1.1432% -0.2422%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 22 outliers among 100 measurements (22.00%)
  6 (6.00%) low severe
  11 (11.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

spawn_tasks             time:   [902.32 ns 911.90 ns 921.01 ns]
                        change: [-1.1438% +0.5670% +2.4469%] (p = 0.54 > 0.05)
                        No change in performance detected.

spawn_tasks_parallel    time:   [819.78 ns 826.17 ns 832.07 ns]
                        change: [-2.1597% -0.5893% +1.1337%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high severe

Darksonn

Please ask us before you spend a lot of time implementing something. I don't think this PR is correct, and I don't think it can be fixed. Unless there's something I'm missing, we will have to close this.

Darksonn · 2023-09-10T08:47:01Z

tokio/src/runtime/task/raw.rs

 unsafe fn schedule<S: Schedule>(ptr: NonNull<Header>) {
    use crate::runtime::task::{Notified, Task};

-    let scheduler = Header::get_scheduler::<S>(ptr);
-    scheduler
-        .as_ref()
-        .schedule(Notified(Task::from_raw(ptr.cast())));
+    let scheduler = Header::get_scheduler::<S>(ptr).as_ref();
+    let notify = Notified(Task::from_raw(ptr.cast()));
+    match scheduler {
+        Some(scheduler) => scheduler.schedule(notify),
+        None => crate::task::schedule::schedule(notify),
+    }
 }


This looks incorrect. Consider this situation: An mpsc channel has the sender in runtime A, and the receiver in runtime B. Then, we send a message on the channel from runtime A. This causes the sender to call wake on the waker stored by the receiver, which corresponds to a task from runtime B, but we are on the thread for runtime A. That call eventually hits this function, and then the None branch of the match. This causes the task from runtime B to get scheduled on runtime A.

There's a test failure due to something similar:

thread 'runtime::tests::task_combinations::test_combinations' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime', tokio/src/task/schedule.rs:19:19 stack backtrace: 0: rust_begin_unwind at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5 1: core::panicking::panic_fmt at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14 2: core::panicking::panic_display at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:150:5 3: tokio::task::schedule::schedule at ./src/task/schedule.rs:19:19 4: tokio::runtime::task::raw::schedule at ./src/runtime/task/raw.rs:286:17 5: tokio::runtime::task::raw::RawTask::schedule at ./src/runtime/task/raw.rs:205:18 6: tokio::runtime::task::harness::<impl tokio::runtime::task::raw::RawTask>::wake_by_ref at ./src/runtime/task/harness.rs:103:17 7: tokio::runtime::task::waker::wake_by_ref at ./src/runtime/task/waker.rs:93:5 8: core::task::wake::Waker::wake_by_ref at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/task/wake.rs:296:18 9: core::ops::function::FnOnce::call_once at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5 10: tokio::sync::oneshot::Task::with_task::{{closure}} at ./src/sync/oneshot.rs:417:13 11: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with at ./src/loom/std/unsafe_cell.rs:11:9 12: tokio::sync::oneshot::Task::with_task at ./src/sync/oneshot.rs:415:9 13: tokio::sync::oneshot::Inner<T>::complete at ./src/sync/oneshot.rs:1127:17 14: tokio::sync::oneshot::Sender<T>::send at ./src/sync/oneshot.rs:608:13 15: tokio::runtime::tests::task_combinations::test_combination at ./src/runtime/tests/task_combinations.rs:396:13 16: tokio::runtime::tests::task_combinations::test_combinations at ./src/runtime/tests/task_combinations.rs:119:37 17: tokio::runtime::tests::task_combinations::test_combinations::{{closure}} at ./src/runtime/tests/task_combinations.rs:68:24 18: core::ops::function::FnOnce::call_once at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5 19: core::ops::function::FnOnce::call_once at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Here, the test calls oneshot::Sender::send while it is not inside the runtime context. This panics because the thread-local is not available.

I got stuck on this this afternoon: The threads without context are hard to talk to the thread with context. This means the reference of scheduler have to put in something like channel, or in the tasks.

I belive the message pass between sync threads and tokio runtime threads is a key capability, so the reference of scheduler in every task may have to be kept. To remove this reference, we may have to write a new tokio::spawn method like tokio::spawn_without_...., that is not a good way.

wathenjiang · 2023-09-10T11:39:42Z

I believe this PR cannot resolve the above problem.

wathenjiang added 3 commits September 9, 2023 20:24

first finish

f8954a2

fix CI

aa226c6

move schedule to schedule.rs

d9018a1

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Sep 9, 2023

wathenjiang added 7 commits September 9, 2023 22:19

fix ci

2cd49c7

fix ci

464e162

make multi_thread_alt using new task

54bd206

make ci happy

d10cbe7

rm more generics parameter

2fe4313

make loom happy

ecfc6e4

make test happy

0126a7c

wathenjiang force-pushed the remove_ref_of_scheduler branch from fac89f8 to 0126a7c Compare September 10, 2023 01:34

fix unsafe get_scheduler

4326c7a

wathenjiang force-pushed the remove_ref_of_scheduler branch from a274d68 to 4326c7a Compare September 10, 2023 01:52

wathenjiang added 3 commits September 10, 2023 13:10

fix task local set

1e871c3

make ci happy

d2b558e

fix unsafe address

83c6a46

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Sep 10, 2023

Darksonn requested changes Sep 10, 2023

View reviewed changes

wathenjiang closed this Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokio: remove ref of scheduler in owned task #5992

tokio: remove ref of scheduler in owned task #5992

wathenjiang commented Sep 9, 2023 •

edited

Loading

Darksonn left a comment •

edited

Loading

Darksonn Sep 10, 2023

Darksonn Sep 10, 2023

wathenjiang Sep 10, 2023

wathenjiang Sep 10, 2023

wathenjiang commented Sep 10, 2023

tokio: remove ref of scheduler in owned task #5992

tokio: remove ref of scheduler in owned task #5992

Conversation

wathenjiang commented Sep 9, 2023 • edited Loading

Motivation

Solution

Darksonn left a comment • edited Loading

Choose a reason for hiding this comment

Darksonn Sep 10, 2023

Choose a reason for hiding this comment

Darksonn Sep 10, 2023

Choose a reason for hiding this comment

wathenjiang Sep 10, 2023

Choose a reason for hiding this comment

wathenjiang Sep 10, 2023

Choose a reason for hiding this comment

wathenjiang commented Sep 10, 2023

wathenjiang commented Sep 9, 2023 •

edited

Loading

Darksonn left a comment •

edited

Loading