-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock on zombie_threads_mtx #234
Fix deadlock on zombie_threads_mtx #234
Conversation
Oh, this just popped in in a EDIT: And the same happened in this build for #232! So there are two PRs each fixing one issue, and each fails tests because of the other bug. I have a feeling we are going in the right direction. |
Locking strategy of code that starts in sys/thread.c:133 looks like utter mess. A brief look at NetBSD's Please add |
sys/thread.c
Outdated
@@ -70,6 +70,9 @@ thread_t *thread_create(const char *name, void (*fn)(void *), void *arg) { | |||
td->td_kstack.stk_base = (void *)PG_VADDR_START(td->td_kstack_obj); | |||
td->td_kstack.stk_size = PAGESIZE; | |||
|
|||
mtx_init(&td->td_lock, MTX_DEF); | |||
cv_init(&td->td_waitcv, "wait"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"td_waitcv"
for better diagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close, but not there yet.
sys/thread.c
Outdated
{ | ||
td->td_exitcode = exitcode; | ||
td->td_state = TDS_INACTIVE; | ||
cv_signal(&td->td_waitcv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv_broadcast
to wake up all thread that awaits td
exit?
@@ -131,16 +136,18 @@ noreturn void thread_exit(int exitcode) { | |||
fdtab_release(td->td_fdtable); | |||
|
|||
mtx_lock(&zombie_threads_mtx); | |||
TAILQ_INSERT_TAIL(&zombie_threads, td, td_zombieq); | |||
mtx_unlock(&zombie_threads_mtx); | |||
|
|||
critical_enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't need critical_enter
here as long as scheduler respects td_lock
mutex. Please have a look at preferred solution kern_thread.c:488. You'll need to add sched_abandon
function that removes a thread from scheduler and makes sure it will not ever be returned to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the solution you've linked to. The entire function is marked with a comment:
Always called with scheduler locked.
which suggests that there is an external mechanism which prevents the scheduler from running while a thread is exiting. The thread mutex gets locked only at the very last moment, which also hints at a different synchronization mechanism besides this mutex.
Also, I do not understand what do you mean by:
as long as scheduler respects
td_lock
mutex.
Do we want to prevent the scheduler from switching context to threads whose mutex is locked? Should the scheduler lock (or, actually, try-lock) the thread mutex when picking next thread? For this to work well w/o obvious deadlock, we would also have to prevent preemption of threads who own their own thread mutex. I'm under the impression that this could lead to complex problems.
I'd appreciate some detailed description of the idea you want me to implement.
include/thread.h
Outdated
@@ -25,6 +27,9 @@ typedef struct thread { | |||
TAILQ_ENTRY(thread) td_zombieq; /* a link on zombie queue */ | |||
char *td_name; | |||
tid_t td_tid; | |||
/* Locks*/ | |||
mtx_t td_lock; | |||
condvar_t td_waitcv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move these to the top of the structure and describe the purpose of td_waitcv
.
Ok. I believe there's no simple and clean way to fix the problem tackled by this PR. While I consider the change to be hack, we cannot wait any longer for it being integrated. @rafalcieslak please look at issue #242, I'll put more information there soon. |
During
thread_exit
, the mutex onzombie_threads
must be unlocked before critical section is left. Otherwise we risk preemption before the mutex is unlocked, and as the thread is TDS_INACTIVE, it will never resume execution - permanently blocking any other threads from locking on the mutex.I've observed this problem several times on my local machine, but I've only managed to trigger it with QEMU, and thus I cannot provide a seed for reproducing the issue.