-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a rare segfault in apr_global_mutex_child_init() #34
base: trunk
Are you sure you want to change the base?
Conversation
…ex_child_init() is being passed a NULL mutex. We've seen this behavior under certain modules, specifically mod_lsapi and a few others under high load/IO wait during graceful restarts: #0 apr_proc_mutex_child_init (mutex=0x8, fname=0x0, pool=0x5566e972f128) at locks/unix/proc_mutex.c:1570 apache#1 0x00002b9a5730cd2c in apr_global_mutex_child_init (mutex=<optimized out>, fname=<optimized out>, pool=<optimized out>) at locks/unix/global_mutex.c:89 With this patch, it shows the following - as well as no more segfaults: [Fri Feb 25 01:13:58.924341 2022] [:emerg] [pid 92020] (20009)No lock was provided and one was required.: [host test.test] mod_lsapi: apr_global_mutex_child_init error for pipe mutex
Is ignoring these the right answer? Or should we set ourselves in a 5-retry loop? With sleeps? |
if (*mutex == NULL) { | ||
return APR_ENOLOCK; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me like an APR contract change, that we should docx the new return code and suggest to the author they can set up the appropriate retry schema, and put it back on mod_isapi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, for APR 2.0
I was thinking about that, maybe a retry loop (hybrid-ish spinlock?) |
Why is the *mutex NULL in the first place? And why would retrying work? |
Short of some ACL problem that the process doesn't have permission to create a mutex, this almost always is going to be a resource contention problem, too many mutexes held. Ultimately, if they are 'forgotten' kernel resources, the process (and perhaps the system) is going down in flames, but sometimes, we are just in an overbooked state, and a retry after a significant (1s) pause will be enough for other threads to release some mutexes back to the pool of resources. |
0006-thread_mutex-APR_THREAD_MUTEX_DEFAULT-should-map-to-.patch I do not have code for other platforms, this patch is a tested (had it on file for quite a while) linux-glibc only approach. |
the adaptative_np mutex spins glibc.pthread.mutex_spin_count (tunable at runtime, default 100) times before calling into the kernel.. |
apr_global_mutex_child_init() is not supposed to be called with |
Under specific circumstances, apr_global_mutex_child_init can be called with NULL mutex. We've seen this behavior under certain modules, specifically mod_lsapi and a few others under high load/IO wait during graceful restarts:
With this patch, it shows the following - as well as no more segfaults:
[Fri Feb 25 01:13:58.924341 2022] [:emerg] [pid 92020] (20009)No lock was provided and one was required.: [host test.test] mod_lsapi: apr_global_mutex_child_init error for pipe mutex