Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary thread "starves" after s2idle #13

Open
heftig opened this issue Dec 9, 2019 · 6 comments
Open

Canary thread "starves" after s2idle #13

heftig opened this issue Dec 9, 2019 · 6 comments

Comments

@heftig
Copy link
Owner

heftig commented Dec 9, 2019

A Dell XPS 13 2-in1 (7390) defaults to using suspend-to-idle for mem sleep. After resume, rtkit thinks the canary thread starved and demotes all the realtime threads.

@nirbheek
Copy link

nirbheek commented Jul 15, 2021

I can reproduce this on a Dell XPS 9380 on Fedora 34. I get audio glitches under load because all pipewire processes lose SCHED_RR on resume.

What's the best way to fix this? Can we signal the watchdog thread when going to suspend and then again on resume so it knows to ignore the lack of canary cheeps while suspended?

It's a bit confusing to me that despite using CLOCK_MONOTONIC, the clock is proceeding while suspended. That shouldn't be happening, right?

Edit: to clarify, I use mem_sleep_default=deep, and /sys/power/mem_sleep is deep, so this is happening with S2RAM in my case, not S2idle.

@zman0900
Copy link

zman0900 commented Jan 7, 2022

This also happens with S3 sleep/suspend, although it does eventually allow RT threads again after about 20 minutes.

@daliborfilus
Copy link

daliborfilus commented May 21, 2022

I don't want to send spam, but this issue provides some context for the same issue. The ticket was closed because fedora 21 got EOL stage, not because it was solved. https://bugzilla.redhat.com/show_bug.cgi?id=688282
The ticket was opened 12 years ago.

The comment #5 by Lennart Poettering suggested that to resolve this, we need to have some notification from the kernel about the suspend.

But I think this issue can have some other workaround. If the root issue is in that the rtkit thread goes to sleep for a long time and reports "big difference in time" after resume, rtkit assumes that this canary thread was blocked for a long time.

Can't we just... you know... use systemd? That systemd, by Lennart? Split rtkit to two parts? One always running (the main daemon) and the second one, watching for thread starvation... and use systemd to stop this watching process before suspend target and start it after resume target?

Or, before this demotion of everything, can't we just... "ping" that thread to see if it's really starved, or it was just some one time issue?
Or use current CPU load to see if the starvation is really an issue?

Or ignore the starvation if the time between the thread responding are greater than, say, 30 minutes?

As I see it, the purpose of the canary thread is to see if the system is stable enough and rtkit can be at ease that the system can actually perform well if all those threads are prioritized.
But can't it just launch two prioritized canary threads and see if they are both running in tandem correctly, and if both are not behaving, assume something else happened and do nothing?
Because if those two threads can't do their job (aren't realtime'd enough), then how could rtkit-daemon itself be "okay" under such condition? It just couldn't, or it would have to be VERY lucky to function at all?

@jmerdich
Copy link

jmerdich commented Sep 2, 2022

Is logind's dbus 'PrepareForSleep()' signal (https://www.freedesktop.org/software/systemd/man/org.freedesktop.login1.html) not good enough for this? We might not be able to assume it's always there because it's part of systemd, but when it is there it seems like the canonical way to detect such a condition.

@jmerdich
Copy link

jmerdich commented Sep 2, 2022

PR up to fix this using the dbus notification mechanism built into systemd logind (when present).

@HurricanePootis
Copy link

Any progress on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants