Skip to content

sched/semaphore: check unmasked pending signals before blocking.#18672

Open
ankohuu wants to merge 1 commit intoapache:masterfrom
ankohuu:fix/sem-wait-pending-signal-race
Open

sched/semaphore: check unmasked pending signals before blocking.#18672
ankohuu wants to merge 1 commit intoapache:masterfrom
ankohuu:fix/sem-wait-pending-signal-race

Conversation

@ankohuu
Copy link
Copy Markdown
Contributor

@ankohuu ankohuu commented Apr 3, 2026

Summary

A signal can arrive before sem_wait transitions the task to
TSTATE_WAIT_SEM. In that window, the wait cannot yet be
aborted by sem_wait_irq(). If sem_wait then blocks without
re-checking unmasked pending signals, it can sleep
indefinitely and miss the interrupt.

Check for unmasked pending signals in wait critical section
and return -EINTR if one is already pending.

Overall, the expected semantics are that sem_wait() should
not block if there is already an unmasked pending signal that
should interrupt the wait. This patch enforces that behavior
by checking for unmasked pending signals before the
task can sleep.

Impact

May bugfix not behavior change

Testing

I confirm that the change was built and runtime-tested on a local setup.

* Build host: Linux x86_64
* Toolchain: local NuttX RISC-V GCC toolchain
* Target: qemu-rv:rv-virt
* Build mode: CONFIG_BUILD_PROTECTED=y
* SMP: CONFIG_SMP=y, CONFIG_SMP_NCPUS=2

Code

   Add delay in sem_wait_slow before entring critical secion
   +  if (strncmp(get_task_name(rtcb), "semhook_waiter",
   +              CONFIG_TASK_NAME_SIZE) == 0 &&
   +      !g_semhook_waited)
   +    {
   +      g_semhook_waited = true;
   +      up_mdelay(500);
   +    }

   The apps-side test is a targeted reproducer rather than a general regression
   test. It creates a dedicated waiter thread, widens the pre-WAIT_SEM window
   through a test hook, sends one signal, and then waits for the waiter thread
   to exit.

   +void signest_semhook_test(void)
   +{ 
      ..
   +  ret = pthread_create(&waiter, NULL, waiter_main, NULL);
   +  ret = sem_wait(&g_semhook_ready);
   +  usleep
   + pthread_kill(waiter
   +  pthread_join(waiter, NULL);

   +static FAR void *waiter_main(FAR void *arg)
   +{
     ..
   +  sem_post(&g_semhook_ready);
   +  ret = sem_wait(&g_semhook_sem);

Logs before this change

NuttShell (NSH) NuttX-12.13.0
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
stdio_test: Standard I/O Check: fprintf to stderr
ostest_main: putenv(Variable1=BadValue3)
ostest_main: setenv(Variable1, GoodValue1, TRUE)
ostest_main: setenv(Variable2, BadValue1, FALSE)
ostest_main: setenv(Variable2, GoodValue2, TRUE)
ostest_main: setenv(Variable3, GoodValue3, FALSE)
ostest_main: setenv(Variable3, BadValue2, FALSE)
show_variable: Variable=Variable1 has value=GoodValue1
show_variable: Variable=Variable2 has value=GoodValue2
show_variable: Variable=Variable3 has value=GoodValue3
ostest_main: Started user_main at PID=6

user_main: semhook test
hang..

Logs after this change

user_main: semhook test
signest_semhook_test: done
user_main: Exiting
ostest_main: Exiting with status 0
nsh>

@ankohuu ankohuu requested review from GUIDINGLI and yamt as code owners April 3, 2026 15:24
@github-actions github-actions bot added Area: OS Components OS Components issues Size: S The size of the change in this PR is small labels Apr 3, 2026
A signal can arrive before sem_wait transitions the task
to TSTATE_WAIT_SEM. In that window, the wait cannot yet
be aborted by sem_wait_irq(). If sem_wait then blocks
without re-checking unmasked pending signals, it can
sleep indefinitely and miss the interrupt.

Check for unmasked pending signals before touching the
semaphore count and return -EINTR if one is already pending.

Signed-off-by: Shunchao Hu <ankohuu@gmail.com>
@ankohuu ankohuu force-pushed the fix/sem-wait-pending-signal-race branch from 3c04aaf to 8011e36 Compare April 3, 2026 16:27

/* Make sure we were supplied with a valid semaphore. */

#ifdef CONFIG_ENABLE_ALL_SIGNALS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef CONFIG_ENABLE_ALL_SIGNALS
#ifndef CONFIG_DISABLE_ALL_SIGNALS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Xiang, only change compilation macros may be not enough, right?

https://github.com/apache/nuttx/blob/master/sched/signal/Make.defs#L32-L36

The current implementation bases on your pending signals and aims to handle the case where a signal interrupts during syscall execution.

@ankohuu
Copy link
Copy Markdown
Contributor Author

ankohuu commented Apr 6, 2026

@xiaoxiang781216

Personally, I think similar partial-signal cases should also be handled, but if we try to cover that case, I can’t find a clean way to do it.

  • If add an extra tcb flag, the changes would spread across several arch places
  • The other option would be to relax the meaning of pending so it also covers partial cases. That would still require changing some macros. Also, the current pending signal appears to conflate the masked-signal case with the syscall-deferred case. I do not think that matches the intended semantics in the sigpending syscall case, which is also why I’m reluctant to make that kind of change.

I’m not familiar with NuttX. I originally only want to try #16808. Appreciate any guidance on how to approach it properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: OS Components OS Components issues Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants