i#7955: check pending sig before sigsuspend#7958
Conversation
Blocks the SIGALRM in the main thread to prevent a case where the helper thread runs quickly and sends the signal before the main thread has had a change to block in sigsuspend. SIGALRM is unblocked by the helper thread so it can receive the itimer signals. This fixes linux.sigmask and linux.sigmask-noalarm in local runs with 100x iterations. Adds a TODO for an edge case where we don't want DR to execute the app sigsuspend if a newly or previously unblocked signal is already pending Issue: #7955
The sigsuspend syscall allows the app to potentially unblock signals atomically. In the pre-syscall handler for sigsuspend, DR must check if any signal that's eligible for delivery is actually queued up already, and if it is, DR should skip the sigsuspend. Also modifies the sigmask.c test app to add a scenario where such a signal was already sent by another thread but previously blocked by the receiving thread and unblocked atomically at the sigsuspend. This fails without the new pending signals check added to sigsuspend handling. Also refactors the sigmask.c test app, adding some new checks. This revealed an existing issue where SIGALRMs are not correctly routed to another thread if the main thread is already handling a SIGALRM (i#2311, PR#5482). Issue: #7955, #5482
Blocks the SIGALRM in the main thread to prevent a case where the helper thread runs quickly and sends the signal before the main thread has had a chance to block in sigsuspend; the sigsuspend unblocks SIGALRM. SIGALRM is also unblocked by the helper thread so it can receive the itimer signals. This fixes linux.sigmask and linux.sigmask-noalarm in local runs with 100x iterations, which would previously hang sometimes. Adds a TODO for an edge case where we don't want DR to execute the app sigsuspend if a newly or previously unblocked signal is already pending. This is fixed by the subsequent PR #7958. Issue: #7955
| * app before returning the EINTR. | ||
| * | ||
| * XXX: For pre-existing unblocked signals that were queued up by | ||
| * DR before the sigsuspend: it's not fully accurate to skip the |
There was a problem hiding this comment.
Could you help me remember how this can happen: wouldn't an unblocked signal set signals_pending to 1? Do you mean they were blocked before -- but if they're unblocked now that will also set signals_pending to 1 right?
There was a problem hiding this comment.
Do you mean they were blocked before -- but if they're unblocked now that will also set signals_pending to 1 right?
This is the case that's fixed by this PR. This XXX comment is about that first scenario in your comment.
Could you help me remember how this can happen: wouldn't an unblocked signal set signals_pending to 1?
Yes it would. Consider the case where the unblocked signal was delivered to DR before the app did the sigsuspend but DR has not yet delivered it to the app yet and now the app is about to do the sigsuspend. That pending signal would not have EINTR'd the sigsuspend in native execution, so ideally DR should also not do that; the right DR handling would be to deliver the signal to the app, and then do the sigsuspend.
Seems to be an edge case that may not be worth the complicated implementation. DR doesn't guarantee transparency in async signal timing anyway.
There was a problem hiding this comment.
OK, I miseread this XXX as saying there is a case of a pending signal that won't come through this code: but instead it's saying it will come here but maybe it should do something different.
There was a problem hiding this comment.
instead it's saying it will come here but maybe it should do something different.
Right.
I'm looking more closely at check_signals_pending now and I think there's another issue: check_signals_pending only ever sets signals_pending, never unsets it. So, if the signal in my hypothetical scenario above was one that was unblocked before the sigsuspend (which caused signals_pending to be set before we got to the sigsuspend) and blocked by the sigsuspend (at which point signals_pending would still remain set), it would cause us to incorrectly return an EINTR here (even though the signal would not and should not be delivered by receive_pending_signal which checks app_sigblocked again).
There was a problem hiding this comment.
Is there a reason check_signals_pending never unsets signals_pending?
Maybe in most current cases, the signals_pending flag is only used to indicate "we may have signals pending" (in which case we should really rename it to maybe_signals_pending)? And so we don't really worry about unsetting it? If that's the case I'm a bit hesitant to change its meaning, and maybe this control path shouldn't be using check_signals_pending (better to use a separate local bool?).
There was a problem hiding this comment.
Should be easier to decide if we have a sigmask.c subtest that actually hits this scenario; I'll do that.
| } | ||
| #endif | ||
| /* Update the pending signals flag to allow pre_system_call to potentially | ||
| * skip the sigsuspend if a eligible signal is already pending. |
There was a problem hiding this comment.
grammar nit: s/a/an/
| * correctly. | ||
| * Is there any use case for disabling rerouting even if the thread is | ||
| * not in an app signal handler when the alarm is delivered to it by | ||
| * the kernel? |
There was a problem hiding this comment.
An app has thread X with alarms unblocked and all other threads have alarms blocked, but a tool is using alarms so DR keeps the kernel mask unblocked. Alarms are delivered to non-X threads and DR re-routes them all to X: would that cause overhead issues? Maybe not.
| * Would it be nicer if we do s/&&/||/ below and make -reroute_alarm_signals | ||
| * default false so it can be set only for the cases that do need it? | ||
| * This would allow apps like sigmask.c from PR #7958 to function | ||
| * correctly. |
There was a problem hiding this comment.
I think this makes sense.
There's also proposals in #2311 to never have kernel mask not match the app mask which would avoid needing to do any rerouting. Add a mention of that in a comment somewhere? I think the reason that's hard is to allow tools to use alarm signals but maybe if this rerouting is a real problem it could be worth revisiting.
| * app before returning the EINTR. | ||
| * | ||
| * XXX: For pre-existing unblocked signals that were queued up by | ||
| * DR before the sigsuspend: it's not fully accurate to skip the |
There was a problem hiding this comment.
OK, I miseread this XXX as saying there is a case of a pending signal that won't come through this code: but instead it's saying it will come here but maybe it should do something different.
The sigsuspend syscall allows the app to potentially unblock signals atomically. In the pre-syscall handler for sigsuspend, DR must check if any signal that's eligible for delivery is actually queued up already, and if it is, DR should skip the sigsuspend.
Also modifies the sigmask.c test app to add a scenario where such a signal was already sent by another thread but previously blocked by the receiving thread and unblocked atomically at the sigsuspend. This fails without the new pending signals check added to sigsuspend handling.
For pending signals that might have been delivered to DR before the app could've done the sigsuspend but are currently still queued up by DR: such signals should be delivered and we should execute the sigsuspend anyway. But we skip on adding that complexity for now (RFC); this may create some spurious sigsuspend wake-ups but apps generally execute sigsuspend in a loop anyway.
Also refactors the sigmask.c test app, adding some new checks. This revealed an existing issue (i#2311) where SIGALRMs are not correctly routed to another thread if they are blocked on the main thread already handling one (i#2311, PR #5482). See another RFC on alternate treatment of -reroute_alarm_signals that might be better.
linux.sigmask and linux.sigmask-noalarm pass 100x still.
Issue: #7955, #2311