[WIP] Explore BESTWAIT hardware steering for scheduler optimization#57
[WIP] Explore BESTWAIT hardware steering for scheduler optimization#57zbelinsk wants to merge 1 commit into
Conversation
This commit explores using the BESTWAIT register for hardware-assisted
scheduling decisions in the reference implementation. The goal is to
investigate whether BESTWAIT can improve scheduler performance or enable
different implementation strategies.
## What was attempted
- Added BESTWAIT register support to hw.h/hw.spec
- Implemented test infrastructure (H2K_bestwait test suite)
- Modified check_sanity.ref.c to explore BESTWAIT-based scheduling
- Updated futex_pi and setup reference implementations
- Attempted to implement opt version of check_sanity with BESTWAIT
BESTWAIT is a global priority register that:
- Holds the priority of the highest-priority task waiting to run
- Triggers a reschedule interrupt when ANY thread's effective priority
becomes worse than the BESTWAIT value
- Does NOT deliver interrupts to specific threads; it's a global signal
## Critical blockers preventing full implementation
1. **Cannot access per-thread STID.PRIO from software**
- The STID register (per-thread state) is not accessible from other threads
- This prevents reading the effective priority of other threads
- Blocks the core comparison logic needed for proper BESTWAIT usage
2. **Resched function cannot implement priority handoff**
- The H2K_dosched() function needs to compare thread priorities to determine
which thread should handle the reschedule interrupt
- Without access to STID.PRIO, we cannot implement the required logic
- This is the blocker for steering interrupts correctly
## Performance implications
- Current check_sanity.ref implementation helps performance
- Attempted opt version also shows NO improvement over current implementation
- The reference implementation remains exploratory
## Why only in ref?
The opt version was attempted but provides no performance benefit. The ref
version is kept to document the exploration and architectural constraints
for future scheduler work.
## Next steps
- Investigate if STID.PRIO can be exposed via privileged interface
- Consider alternative scheduling implementation accommodating the bestwait reg usage
Signed-off-by: Zeev Belinsky <zbelinsk@qti.qualcomm.com>
|
Update- was reconsidering together with andreykarpenko-qc the current architecture.
Since BESTWAIT is a comparator scanning all hw threads, it could be a cheap arch change. BESTAWIT could reflect not only "I have fired" but also "hw thread x is least prioritized". This helps with check_sanity.ref.c, but not for the dosched.ref.c- it requires a different mechanism rather than BESTWAIT- which sends interrupt (exactly what check_sanity does). If we had "hardware way" for these two sites, we could drop runlist array from the code. |
Might be a good idea, but I don't understand this stuff from the summary:
|
In the kernel sched code, we are keeping scheduling going, by doing two things:
Today we have the software runlist array, for achieving the described functionallity- we query it. If we want to offload it to hardware, we need to query somehow- "who is the least prioritized"- which can't be done, since we are not able to access STID.PRIO of hthread Y from hthread X for reading, tho we can set that by using set_thread_stid_prio setter I have added in this pr. Using BESTWAIT in the current state, while we need to update also the software runlist- produces no benefit (packets counting wise, check_sanity.opt is ~10 packets already). |
This commit explores using the BESTWAIT register for hardware-assisted scheduling decisions in the reference implementation. The goal is to investigate whether BESTWAIT can improve scheduler performance or enable different implementation strategies.
What was attempted
BESTWAIT is a global priority register that:
Critical blockers preventing full implementation
Cannot access per-thread STID.PRIO from software
Resched function cannot implement priority handoff
Performance implications
Only in ref:
The opt version was attempted but provides no performance benefit. The ref version is kept to document the exploration and architectural constraints for future scheduler work.
Next steps