[RFC] scx_rusty: Perform task migration immediately #994

vax-r · 2024-11-27T10:10:38Z

Summary

According to #611, the load balancer produce new domain id into "lb_data" for every load balancing period, however, sometimes these tasks isn't going to be scheduled in the coming scheduling period. So there will be no task migration actually performed.

Utilize BPF_PROG_TYPE_SYSCALL so the load balancer can update the task context's domain right away and transfer related information between push domain and pull domain immediately.

The implementation still has some points to be considered.

When the task picked to be migrated is running, do we ignore it or how do we shut it down ?
Currently dom_active_tptrs has some problems , according to scx_rusty: Temporary fix of duplicate active tptr #941 , we should re-design the structure and keep it synchronized with each load balancing decision. Once dom_active_tptrs is always up-to-date, the tasks information in userspace can be easily keep up, and should be safe to regenerate dom.tasks each time.

(Personally , I think we should fix the problem of dom_active_tptrs first, as it will eventually need to be synchronize with task migrations. )

Test

( Test cases and metrics to be considered might not be enough coverage now, please let me know if you have any suggestions or ideas, I'll be happy to add more tests to verify everything. )
The test is done on AMD Ryzen 7 5700X3D 8-Core Processor , architecture is x86_64

Kernel Compilation workload

We ran the rusty scheduler with the following command

$ sudo ./build/debug/scx_rusty --stats 300 --cpumasks 0xf000 0x0f00 0x00f0 0x000f

Take kernel compilation as a test workload, the number of migration and the total time of kernel compilation needed is shown below

###### Wed, 27 Nov 2024 17:56:41 +0800, load balance @  -677.0ms ######
cpu=  24.64 load=    4.15 mig=1907 task_err=0 lb_data_err=0 time_used=928.6ms
tot= 213982 sync_prev_idle= 1.05 wsync= 1.06
prev_idle=57.46 greedy_idle= 4.90 pin= 0.00
dir=23.10 dir_greedy= 0.10 dir_greedy_far= 0.44
dsq=11.07 greedy_local= 0.82 greedy_xnuma= 0.00
kick_greedy= 0.34 rep= 0.15
dl_clamp= 7.87 dl_preset= 4.02
slice=20000us
direct_greedy_cpus=ffff
  kick_greedy_cpus=ffff
  NODE[00] load=  4.15 imbal=  +0.00 delta=  +0.00
   DOM[00] load=  1.13 imbal=  +0.09 delta=  -0.06
   DOM[01] load=  0.93 imbal=  -0.11 delta=  +0.00
   DOM[02] load=  1.28 imbal=  +0.25 delta=  -0.02
   DOM[03] load=  0.81 imbal=  -0.22 delta=  +0.08

And the total kernel compilation time

real	6m3.684s
user	0m0.027s
sys	0m0.111s

After the change, the stats are shown below

###### Wed, 27 Nov 2024 17:05:13 +0800, load balance @  -689.0ms ######
cpu=  24.69 load=    4.19 mig=1583 task_err=0 lb_data_err=0 time_used=1008.6ms
tot= 225426 sync_prev_idle= 0.97 wsync= 0.92
prev_idle=56.13 greedy_idle= 6.49 pin= 0.00
dir=24.05 dir_greedy= 0.11 dir_greedy_far= 0.46
dsq=10.52 greedy_local= 0.35 greedy_xnuma= 0.00
kick_greedy= 0.04 rep= 0.19
dl_clamp= 0.00 dl_preset=10.87
slice=20000us
direct_greedy_cpus=ffff
  kick_greedy_cpus=ffff
  NODE[00] load=  4.19 imbal=  +0.00 delta=  +0.00
   DOM[00] load=  1.40 imbal=  +0.36 delta=  -0.04
   DOM[01] load=  0.93 imbal=  -0.12 delta=  +0.00
   DOM[02] load=  0.75 imbal=  -0.30 delta=  +0.44
   DOM[03] load=  1.10 imbal=  +0.06 delta=  -0.40

real	5m59.663s
user	0m0.030s
sys	0m0.113s

While kernel compilation time shrink alittle, we can observe that the number of migrations has been decreased alot, which implies that the migration operation is more effective to decrease the load unbalance situation between domains, so the load balancer wouldn't have to perform so many useless operation.

Stress test

Run the rusty scheduler ( collect metrics every 50 secs )

$ sudo ./build/debug/scx_rusty --stats 50 --cpumasks 0xf000 0x0f00 0x00f0 0x000f

Test the change with stress-ng

$ stress-ng --cpu 12 --iomix 12 --timeout 60s --metrics

The metrics result shown below

###### Wed, 27 Nov 2024 18:08:13 +0800, load balance @  -676.4ms ######
cpu=  78.26 load=   14.72 mig=14 task_err=0 lb_data_err=0 time_used=142.1ms
tot= 183819 sync_prev_idle= 0.01 wsync= 0.02
prev_idle= 6.47 greedy_idle= 1.70 pin= 0.00
dir=34.49 dir_greedy= 2.15 dir_greedy_far= 1.84
dsq=31.31 greedy_local=22.00 greedy_xnuma= 0.00
kick_greedy= 0.22 rep= 0.66
dl_clamp= 0.01 dl_preset=53.31
slice=20000us
direct_greedy_cpus=f0ff
  kick_greedy_cpus=ffff
  NODE[00] load= 14.72 imbal=  -0.00 delta=  +0.00
   DOM[00] load=  3.61 imbal=  -0.07 delta=  +0.00
   DOM[01] load=  3.55 imbal=  -0.13 delta=  +0.00
   DOM[02] load=  3.89 imbal=  +0.21 delta=  +0.00
   DOM[03] load=  3.67 imbal=  -0.01 delta=  +0.00

It performs almost the same with and without the change for heavy workload on every cores.

Related Issue

#611

Under severe load unbalance scenario such as mixtures of CPU-insensive workload and I/O-intensive worload, same tptr may be written into the same dom_active_tptrs's array. It will lead to load balancer's failure because when the tptr task contains large enough load, it tends be to selected so warnings about same tptr being set in "lb_data" will continue to pop up. Use a workaround for now , which is to keep a HashSet in userspace recording the current active tptr under a domain, and do not generate the same task repeatedly. Signed-off-by: I Hsin Cheng <[email protected]>

According to sched-ext#611, the load balancer produce new domain id into "lb_data" for every load balancing period, however, sometimes these tasks isn't going to be scheduled in the coming scheduling period. So there will be no task migration actually performed. Utilize BPF_PROG_TYPE_SYSCALL so the load balancer can update the task context's domain right away and transfer related information between push domain and pull domain immediately. Signed-off-by: I Hsin Cheng <[email protected]>

vax-r added 2 commits November 27, 2024 17:36

vax-r force-pushed the rusty_migrate_immediately branch from 3fab6d0 to d5c28e2 Compare November 27, 2024 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] scx_rusty: Perform task migration immediately #994

[RFC] scx_rusty: Perform task migration immediately #994

vax-r commented Nov 27, 2024 •

edited

Loading

[RFC] scx_rusty: Perform task migration immediately #994

Are you sure you want to change the base?

[RFC] scx_rusty: Perform task migration immediately #994

Conversation

vax-r commented Nov 27, 2024 • edited Loading

Summary

Test

Kernel Compilation workload

Stress test

Related Issue

vax-r commented Nov 27, 2024 •

edited

Loading