Kernel thread stop reap blocked#39
Conversation
| puts("Joining worker 0 thread"); | ||
| pthread_join(intpool_child_1, &ret); | ||
| puts("Joining worker 1 thread"); | ||
| // stop_threads = 1; |
There was a problem hiding this comment.
Please remove code lines which are in comments
Returning from main (or calling exit()) needs to tear down the entire VM, not just the calling thread. Previously sys_exit() funneled into pthread_exit, which left peer threads — especially ones blocked in a futex (H2K_STATUS_BLOCKED) or intpool wait (H2K_STATUS_INTBLOCKED) — stuck forever, leaking the vmblock. Introduce a new H2_TRAP_VM_STOP (#7) syscall vector dispatched to H2K_vm_stop, and route sys_exit through h2_vm_stop_trap unconditionally (any status, not just zero). H2K_vm_stop scans the vmblock's context array and reaps every non-DEAD peer regardless of state: * BLOCKED — remove from futex hash, cancel timer * INTBLOCKED — remove from intpool ring * READY / VMWAIT — remove from runlist * RUNNING peers on other HW threads — flagged via vmblock->exiting and IPI'd; resched.ref.c picks up the flag and self-reaps the context before scheduling Each reap path performs the common cleanup (ASID refcount, context clear, return to free list, num_cpus--), then the new helper H2K_vmblock_finalize_if_done_locked() signals the parent VM and frees the vmblock once num_cpus reaches zero. Track which context is "main" so future policy can hinge on it: vmblock->main_context is set by setup.ref.c for the boot VM and by create.ref.c for the first thread of every other VM. Trap-table test (kernel/event/trap/test/test.c) and the intpool h2 test (kernel/event/intpool/test_h2/test.c) are updated for the new vector and the new teardown semantics — the latter no longer needs to manually join workers since main's return now reaps them. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Add 36 self-contained tests under libs/posix/pthread/test_h2/ exercising
the pthread/sem/rwlock/barrier/TLS surface and the H2K_vm_stop teardown
path introduced in the previous commit. Each test is a directory with
test.c + Makefile + Makefile.inc following the existing test_h2
convention.
Register the new tests in scripts/testlist.v61 and scripts/testlist.v81
so they run as part of the unified test suite. pthread_exit_main is
checked in but commented out in both testlists pending follow-up.
Coverage groups:
* Basic API: attr_roundtrip, barrier_basic, cond_signal_broadcast,
detach_states, join_basic, join_invalid, mutex_recursive,
mutex_trylock, rwlock_readers_writers, sem_corner,
tls_keys
* exit() teardown (exercises H2K_vm_stop):
exit1_main_mutex, exit1_main_cond_wait,
exit1_worker_cond_wait, exit1_detached_worker_with_stuck
* pthread_exit-from-main (POSIX: terminate caller, keep process alive):
pthread_exit_main
* Blocked-thread reap on VM exit (workers parked in sync primitives
when main returns):
stuck_in_join, stuck_in_mutex, stuck_in_cond_wait,
stuck_in_cond_timedwait, stuck_in_sem_wait,
stuck_in_sem_timedwait, stuck_in_rwlock_rd,
stuck_in_rwlock_wr, stuck_in_barrier,
stuck_in_pthread_exit_joined
* Negative / misuse:
neg_attr_setstacksize_zero, neg_barrier_init_zero,
neg_cond_wait_no_mutex, neg_create_invalid_routine,
neg_join_self, neg_mutex_destroy_held,
neg_mutex_unlock_unowned, neg_rwlock_unlock_unheld,
neg_sem_overflow_post, neg_tls_use_after_delete
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Three independent fixes in the test build/run plumbing surfaced once
multiple ARCHV×TARGET variants started running in parallel and writing
into a shared in-source test directory.
makefile:
* Per-variant test_results.json rule prefixes the inner $(MAKE) with
'-' so unified-report aggregation runs even when one variant's
tests fail. The JSON file is still written by h2_test before
check-fail runs.
* h2_test reorders steps so test_report.html and test_results.json
are generated *before* the warning-grep gate; previously a stray
warning blocked report generation entirely.
scripts/Makefile.coverage:
* Per-test results.txt rule's inner $(MAKE) gets the '-' prefix so
.DELETE_ON_ERROR doesn't wipe FAIL details and abort tst — PASS/FAIL
is captured in results.txt content for check-fail and
gen_test_results.py.
* Symlink whitelist replaces the old "everything except Makefile.inc"
blacklist when populating the per-variant build dir. Without the
whitelist, every variant followed symlinks back to the source tree
and clobbered each other's *.elf / *.o / results.txt / gmon-*.out,
leaving only the last writer's outputs in the report. Whitelist
covers source extensions (Makefile, *.c/.h/.S/.s/.cpp/.cc/.py/.dat/
.cfg, tested_functions); explicit excludes drop generator outputs
whose extension matches a source extension (scenarios.h,
generated_tests.dat, threadmap.py).
scripts/Makefile.inc.test:
* Add 'test' as an alias for 'tst' and mark test/tst/all .PHONY so
they don't collide with stray files of the same name.
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Per POSIX, pthread_exit from the main thread terminates only the main
thread; the process must remain alive so other threads run to
completion. Two distinct bugs together broke that; both are fixed here,
and pthread_exit_main is enabled in v61 and v81 testlists to lock the
behavior in.
kernel/thread/stop: drop is_main shortcut
H2K_thread_stop (H2_TRAP_THREAD_STOP, used by pthread_safe_death)
treated main_context exiting as a VM teardown -- calling vm_stop_locked
to reap every sibling. That collapsed pthread_exit and exit() into the
same fatal path. Only exit() routes through H2_TRAP_VM_STOP, so the
kernel can distinguish: thread_stop now does the same per-me cleanup
for main as for any worker, then nulls main_context so the existing
all-blocked reaper at the bottom of the function can finalize the VM
cleanly when remaining threads settle. exit()/sys_exit() still gets
full VM teardown via H2K_vm_stop, untouched.
libs/posix/pthread: don't queue main's TLS for deferred free
pthread_exit defers freeing the exiting thread's TCB+TLS via static
old_freeptr / old_stack_freeptr -- set so the next exiting thread frees
the previous one. The math (char *)self - elftls_size only works for
worker threads (calloc'd by pthread_create with TCB at
tmpptr + elftls_size). For main:
* Small ELF TLS path: TCB lives in mainthread_static_storage (BSS).
Freeing into BSS is undefined.
* Large ELF TLS path: TCB sits at malloc'd + alignment correction --
the math gives main_thread_tls, not the malloc return.
Use aligned_alloc(TLS_ALIGN, round_up(size, TLS_ALIGN)) in the
large-TLS path so the math is correct (no manual alignment fixup), and
track main's TCB via a file-scope main_tcb pointer. pthread_exit on
main consumes the previously queued frees but skips queueing itself,
so neither static storage nor a non-malloc'd interior pointer ever
reaches free().
scripts/testlist.v{61,81}: enable pthread_exit_main
The test (added in 649bddf) exercises main calling pthread_exit with
a mix of joinable and detached worker threads. With both fixes above,
it passes under archsim and hexagon-sim.
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Mirrors pthread_exit_main except no worker calls exit(): every worker just `return NULL`s from its start routine, and the last one to bump the shared counter prints TEST PASSED before returning. Main still leaves via pthread_exit(NULL). This forces VM teardown through the all-blocked / all-dead reaper path -- nobody triggers the H2_TRAP_VM_STOP shortcut -- and locks in a clean exit status from that path. Registered in scripts/testlist.v61 and scripts/testlist.v81. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
3c19e13 to
6ac16fe
Compare
Erich Plondke (eplondke)
left a comment
There was a problem hiding this comment.
This is a great feature, it's something we need pretty badly: the ability to kill another thread, and building on top of that being able to kill a vm/process/etc.
There's some partial support for killing another task and we test it at least somewhat. If we set the H2K_VMSTATUS_KILL bit in a thread it's supposed to wake up and call H2K_thread_stop(). But we currently don't ever set that bit outside of testing! :-) But I think that infrastructure might be the best way to reuse some of the other infrastructure to get a thread killed off regardless of its state.
Conceptually, it seems like then the "vm death" sequence is:
- Mark the VM as "being killed"
- Mark all threads in the VCPU for death.
- Possibly, make sure we don't create new VCPUs in a VM "being killed"
- If the number of VCPUs in the VM goes to zero, we either "signal the parent" (like we might already do?) or "just tear down the VM" as a configured option.
- Hopefully this works if a guest is asking for itself to be torn down, or the parent is tearing down the child.
This seems pretty great, but I'm hopeful that we can leverage the existing infrastructure a little more if it's working. If it's not working... maybe we can fix it... or if it's too broken then we can create something new like this.
Let me know if you want to chat live about this!
| /* Main (first-created) thread of this vmblock. When this context calls | ||
| * H2K_thread_stop the entire vmblock is torn down, matching POSIX | ||
| * exit()/return-from-main semantics. NULL once main has exited. */ | ||
| H2K_thread_context *main_context; |
There was a problem hiding this comment.
I'm not sure this is the best mechanism to be able to kill an entire VM. There is some partial support for H2K_VMSTATUS_KILL but nothing ever sets the bit and gets started. I think having a VMOP that kills a specific CPU and/or whole guest is a great one, but let's try to use the VMSTATUS_KILL infra...
| } | ||
|
|
||
| static inline void resched(u32_t unused, H2K_thread_context *me, u32_t hwtnum) { | ||
| if (me != NULL && me->vmblock != NULL && me->vmblock->exiting) { |
There was a problem hiding this comment.
Let's try to avoid this special case
| * Used when a CLUSTER_RESCHED_INT (or RESCHED_INT) lands on a HW thread that | ||
| * was running a context belonging to a vmblock whose main thread has exited. | ||
| * Mirrors the per-context cleanup in H2K_thread_stop. */ | ||
| static inline void self_reap_locked(H2K_thread_context *me) |
There was a problem hiding this comment.
Is this mostly the same thing that would happen with H2K_thread_stop? we should reuse that as much as possible... and if there are things missing in H2K_thread_stop we should make sure those go there too...
| vmblock->num_cpus++; | ||
| tmp->vmblock = vmblock; | ||
|
|
||
| /* Record the first thread as main: its exit triggers VM teardown. */ |
There was a problem hiding this comment.
Do we want to tear down the whole VM? Or just put it in all-threads-stopped where the spawning task frees it (current approach)? This is kind of like "zombie" and needing to join in posix? Although also that's a hassle sometimes :-)
There was a problem hiding this comment.
Yeah, I don't think that we should have a concept of the "main" thread (vcpu) of a vm. For a paravirtualized guest like linux, the vcpus are just a symmetric set of cpus -- there is no "main" cpu that stops all the others when it stops -- in most architectures anyway -- and I don't think that's something we should introduce in the HVM architecture.
So it's ok (and necessary) to have the concept of a main posix thread, but that shouldn't map to a particular vcpu even though in h2os that's how we implement it.
pthread_exit() of the main thread should kill all the other pthreads, but not (directly) all the vcpus. It's quite possible that there is a vcpu that is not running the code of a pthread.
Therefore I think the mechanisms for "stop all pthreads" and "stop the vm" should be separate, logically, though they can share some infrastructure.
| #define H2K_STOP_H 1 | ||
|
|
||
| #include <context.h> | ||
| #include <vmblock.h> |
There was a problem hiding this comment.
Thread stop and vm stop should probably be separated (although perhaps one should call the other?)
There was a problem hiding this comment.
Yeah that. Should have kept reading...
Currently, when the last vcpu stops we send H2K_VM_CHILDINT to the parent vcpu if it's still alive (!H2K_STATUS_DEAD), then tear down the vm unconditionally. We also signal the parent if any child vcpu exits non-0. We can certainly add a vmop to set H2K_VMSTATUS_KILL on a vm (child only) and then interrupt any running vcpus from that vm so that they go do vmwork and discover that they need to terminate. |
Three independent kernel cleanups: - Remove the H2K_vmblock_t::main_context field and all its bookkeeping. Main and worker threads now share the H2K_thread_stop path uniformly; VM-wide teardown is driven solely by H2K_vm_stop (the exiting flag), so tracking which context was "main" is no longer needed. Drops the field from vmblock.h and its assignments in setup/create/stop. - Factor the duplicated per-context teardown tail (asid_dec -> context_clear -> push free list -> num_cpus--) into a shared H2K_free_context_locked() exported from stop.h. Collapses the four open-coded copies in reap_one_locked, vm_stop_locked, H2K_thread_stop, and resched's self-reap onto it. - Fold the exiting-VM self-reap in resched() into the normal if (me != NULL) path instead of a separate early-return branch. Removes the self_reap_locked helper along with its duplicated runlist_remove/dosched and the unreachable tail, leaving a single exit point. On a reap, me is set to NULL so the shared dosched(me) tail won't save the dying context. No behavior change. Verified: clean build + full v81/opt test suite (136 passed, 0 failed). Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
9f4f231 to
6653e29
Compare
|
The tests here are really awesome. They found some great problems. |
No description provided.