Kernel thread stop reap blocked by andreykarpenko-qc · Pull Request #39 · qualcomm/hexagon-hypervisor

andreykarpenko-qc · 2026-05-31T14:25:31Z

No description provided.

stzahi1 · 2026-06-01T09:56:23Z

-	puts("Joining worker 0 thread");
-	pthread_join(intpool_child_1, &ret);
-	puts("Joining worker 1 thread");
+//	stop_threads = 1;


Please remove code lines which are in comments

Returning from main (or calling exit()) needs to tear down the entire VM, not just the calling thread. Previously sys_exit() funneled into pthread_exit, which left peer threads — especially ones blocked in a futex (H2K_STATUS_BLOCKED) or intpool wait (H2K_STATUS_INTBLOCKED) — stuck forever, leaking the vmblock. Introduce a new H2_TRAP_VM_STOP (#7) syscall vector dispatched to H2K_vm_stop, and route sys_exit through h2_vm_stop_trap unconditionally (any status, not just zero). H2K_vm_stop scans the vmblock's context array and reaps every non-DEAD peer regardless of state: * BLOCKED — remove from futex hash, cancel timer * INTBLOCKED — remove from intpool ring * READY / VMWAIT — remove from runlist * RUNNING peers on other HW threads — flagged via vmblock->exiting and IPI'd; resched.ref.c picks up the flag and self-reaps the context before scheduling Each reap path performs the common cleanup (ASID refcount, context clear, return to free list, num_cpus--), then the new helper H2K_vmblock_finalize_if_done_locked() signals the parent VM and frees the vmblock once num_cpus reaches zero. Track which context is "main" so future policy can hinge on it: vmblock->main_context is set by setup.ref.c for the boot VM and by create.ref.c for the first thread of every other VM. Trap-table test (kernel/event/trap/test/test.c) and the intpool h2 test (kernel/event/intpool/test_h2/test.c) are updated for the new vector and the new teardown semantics — the latter no longer needs to manually join workers since main's return now reaps them. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Add 36 self-contained tests under libs/posix/pthread/test_h2/ exercising the pthread/sem/rwlock/barrier/TLS surface and the H2K_vm_stop teardown path introduced in the previous commit. Each test is a directory with test.c + Makefile + Makefile.inc following the existing test_h2 convention. Register the new tests in scripts/testlist.v61 and scripts/testlist.v81 so they run as part of the unified test suite. pthread_exit_main is checked in but commented out in both testlists pending follow-up. Coverage groups: * Basic API: attr_roundtrip, barrier_basic, cond_signal_broadcast, detach_states, join_basic, join_invalid, mutex_recursive, mutex_trylock, rwlock_readers_writers, sem_corner, tls_keys * exit() teardown (exercises H2K_vm_stop): exit1_main_mutex, exit1_main_cond_wait, exit1_worker_cond_wait, exit1_detached_worker_with_stuck * pthread_exit-from-main (POSIX: terminate caller, keep process alive): pthread_exit_main * Blocked-thread reap on VM exit (workers parked in sync primitives when main returns): stuck_in_join, stuck_in_mutex, stuck_in_cond_wait, stuck_in_cond_timedwait, stuck_in_sem_wait, stuck_in_sem_timedwait, stuck_in_rwlock_rd, stuck_in_rwlock_wr, stuck_in_barrier, stuck_in_pthread_exit_joined * Negative / misuse: neg_attr_setstacksize_zero, neg_barrier_init_zero, neg_cond_wait_no_mutex, neg_create_invalid_routine, neg_join_self, neg_mutex_destroy_held, neg_mutex_unlock_unowned, neg_rwlock_unlock_unheld, neg_sem_overflow_post, neg_tls_use_after_delete Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Three independent fixes in the test build/run plumbing surfaced once multiple ARCHV×TARGET variants started running in parallel and writing into a shared in-source test directory. makefile: * Per-variant test_results.json rule prefixes the inner $(MAKE) with '-' so unified-report aggregation runs even when one variant's tests fail. The JSON file is still written by h2_test before check-fail runs. * h2_test reorders steps so test_report.html and test_results.json are generated *before* the warning-grep gate; previously a stray warning blocked report generation entirely. scripts/Makefile.coverage: * Per-test results.txt rule's inner $(MAKE) gets the '-' prefix so .DELETE_ON_ERROR doesn't wipe FAIL details and abort tst — PASS/FAIL is captured in results.txt content for check-fail and gen_test_results.py. * Symlink whitelist replaces the old "everything except Makefile.inc" blacklist when populating the per-variant build dir. Without the whitelist, every variant followed symlinks back to the source tree and clobbered each other's *.elf / *.o / results.txt / gmon-*.out, leaving only the last writer's outputs in the report. Whitelist covers source extensions (Makefile, *.c/.h/.S/.s/.cpp/.cc/.py/.dat/ .cfg, tested_functions); explicit excludes drop generator outputs whose extension matches a source extension (scenarios.h, generated_tests.dat, threadmap.py). scripts/Makefile.inc.test: * Add 'test' as an alias for 'tst' and mark test/tst/all .PHONY so they don't collide with stray files of the same name. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Per POSIX, pthread_exit from the main thread terminates only the main thread; the process must remain alive so other threads run to completion. Two distinct bugs together broke that; both are fixed here, and pthread_exit_main is enabled in v61 and v81 testlists to lock the behavior in. kernel/thread/stop: drop is_main shortcut H2K_thread_stop (H2_TRAP_THREAD_STOP, used by pthread_safe_death) treated main_context exiting as a VM teardown -- calling vm_stop_locked to reap every sibling. That collapsed pthread_exit and exit() into the same fatal path. Only exit() routes through H2_TRAP_VM_STOP, so the kernel can distinguish: thread_stop now does the same per-me cleanup for main as for any worker, then nulls main_context so the existing all-blocked reaper at the bottom of the function can finalize the VM cleanly when remaining threads settle. exit()/sys_exit() still gets full VM teardown via H2K_vm_stop, untouched. libs/posix/pthread: don't queue main's TLS for deferred free pthread_exit defers freeing the exiting thread's TCB+TLS via static old_freeptr / old_stack_freeptr -- set so the next exiting thread frees the previous one. The math (char *)self - elftls_size only works for worker threads (calloc'd by pthread_create with TCB at tmpptr + elftls_size). For main: * Small ELF TLS path: TCB lives in mainthread_static_storage (BSS). Freeing into BSS is undefined. * Large ELF TLS path: TCB sits at malloc'd + alignment correction -- the math gives main_thread_tls, not the malloc return. Use aligned_alloc(TLS_ALIGN, round_up(size, TLS_ALIGN)) in the large-TLS path so the math is correct (no manual alignment fixup), and track main's TCB via a file-scope main_tcb pointer. pthread_exit on main consumes the previously queued frees but skips queueing itself, so neither static storage nor a non-malloc'd interior pointer ever reaches free(). scripts/testlist.v{61,81}: enable pthread_exit_main The test (added in 649bddf) exercises main calling pthread_exit with a mix of joinable and detached worker threads. With both fixes above, it passes under archsim and hexagon-sim. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Mirrors pthread_exit_main except no worker calls exit(): every worker just `return NULL`s from its start routine, and the last one to bump the shared counter prints TEST PASSED before returning. Main still leaves via pthread_exit(NULL). This forces VM teardown through the all-blocked / all-dead reaper path -- nobody triggers the H2_TRAP_VM_STOP shortcut -- and locks in a clean exit status from that path. Registered in scripts/testlist.v61 and scripts/testlist.v81. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Erich Plondke (eplondke)

This is a great feature, it's something we need pretty badly: the ability to kill another thread, and building on top of that being able to kill a vm/process/etc.

There's some partial support for killing another task and we test it at least somewhat. If we set the H2K_VMSTATUS_KILL bit in a thread it's supposed to wake up and call H2K_thread_stop(). But we currently don't ever set that bit outside of testing! :-) But I think that infrastructure might be the best way to reuse some of the other infrastructure to get a thread killed off regardless of its state.

Conceptually, it seems like then the "vm death" sequence is:

Mark the VM as "being killed"
Mark all threads in the VCPU for death.
Possibly, make sure we don't create new VCPUs in a VM "being killed"
If the number of VCPUs in the VM goes to zero, we either "signal the parent" (like we might already do?) or "just tear down the VM" as a configured option.
Hopefully this works if a guest is asking for itself to be torn down, or the parent is tearing down the child.

This seems pretty great, but I'm hopeful that we can leverage the existing infrastructure a little more if it's working. If it's not working... maybe we can fix it... or if it's too broken then we can create something new like this.

Let me know if you want to chat live about this!

Erich Plondke (eplondke) · 2026-06-24T14:22:48Z

+	/* Main (first-created) thread of this vmblock.  When this context calls
+	 * H2K_thread_stop the entire vmblock is torn down, matching POSIX
+	 * exit()/return-from-main semantics.  NULL once main has exited. */
+	H2K_thread_context *main_context;


I'm not sure this is the best mechanism to be able to kill an entire VM. There is some partial support for H2K_VMSTATUS_KILL but nothing ever sets the bit and gets started. I think having a VMOP that kills a specific CPU and/or whole guest is a great one, but let's try to use the VMSTATUS_KILL infra...

Erich Plondke (eplondke) · 2026-06-24T14:23:33Z

+}

 static inline void resched(u32_t unused, H2K_thread_context *me, u32_t hwtnum) {
+	if (me != NULL && me->vmblock != NULL && me->vmblock->exiting) {


Let's try to avoid this special case

Erich Plondke (eplondke) · 2026-06-24T14:25:06Z

+ * Used when a CLUSTER_RESCHED_INT (or RESCHED_INT) lands on a HW thread that
+ * was running a context belonging to a vmblock whose main thread has exited.
+ * Mirrors the per-context cleanup in H2K_thread_stop. */
+static inline void self_reap_locked(H2K_thread_context *me)


Is this mostly the same thing that would happen with H2K_thread_stop? we should reuse that as much as possible... and if there are things missing in H2K_thread_stop we should make sure those go there too...

Erich Plondke (eplondke) · 2026-06-24T14:26:26Z

 	vmblock->num_cpus++;
 	tmp->vmblock = vmblock;

+	/* Record the first thread as main: its exit triggers VM teardown. */


Do we want to tear down the whole VM? Or just put it in all-threads-stopped where the spawning task frees it (current approach)? This is kind of like "zombie" and needing to join in posix? Although also that's a hassle sometimes :-)

Yeah, I don't think that we should have a concept of the "main" thread (vcpu) of a vm. For a paravirtualized guest like linux, the vcpus are just a symmetric set of cpus -- there is no "main" cpu that stops all the others when it stops -- in most architectures anyway -- and I don't think that's something we should introduce in the HVM architecture.
So it's ok (and necessary) to have the concept of a main posix thread, but that shouldn't map to a particular vcpu even though in h2os that's how we implement it.
pthread_exit() of the main thread should kill all the other pthreads, but not (directly) all the vcpus. It's quite possible that there is a vcpu that is not running the code of a pthread.
Therefore I think the mechanisms for "stop all pthreads" and "stop the vm" should be separate, logically, though they can share some infrastructure.

Erich Plondke (eplondke) · 2026-06-24T14:27:07Z

 #define H2K_STOP_H 1

 #include <context.h>
+#include <vmblock.h>


Thread stop and vm stop should probably be separated (although perhaps one should call the other?)

Yeah that. Should have kept reading...

Bryan Bayerdorffer (bryanb-h2) · 2026-06-25T00:19:31Z

This is a great feature, it's something we need pretty badly: the ability to kill another thread, and building on top of that being able to kill a vm/process/etc.

There's some partial support for killing another task and we test it at least somewhat. If we set the H2K_VMSTATUS_KILL bit in a thread it's supposed to wake up and call H2K_thread_stop(). But we currently don't ever set that bit outside of testing! :-) But I think that infrastructure might be the best way to reuse some of the other infrastructure to get a thread killed off regardless of its state.

Conceptually, it seems like then the "vm death" sequence is:
* Mark the VM as "being killed"

* Mark all threads in the VCPU for death.

* Possibly, make sure we don't create new VCPUs in a VM "being killed"

* If the number of VCPUs in the VM goes to zero, we either "signal the parent" (like we might already do?) or "just tear down the VM" as a configured option.

* Hopefully this works if a guest is asking for itself to be torn down, or the parent is tearing down the child.
This seems pretty great, but I'm hopeful that we can leverage the existing infrastructure a little more if it's working. If it's not working... maybe we can fix it... or if it's too broken then we can create something new like this.

Let me know if you want to chat live about this!

Currently, when the last vcpu stops we send H2K_VM_CHILDINT to the parent vcpu if it's still alive (!H2K_STATUS_DEAD), then tear down the vm unconditionally. We also signal the parent if any child vcpu exits non-0.
Maybe this should be a shint instead so the parent doesn't have to hang around, but it seemed kind of good practice at the time I added this to expect the parent to wait (blocked) until the child vm finishes. This was intended mainly to wake up booter when stuff finishes.

We can certainly add a vmop to set H2K_VMSTATUS_KILL on a vm (child only) and then interrupt any running vcpus from that vm so that they go do vmwork and discover that they need to terminate.

Three independent kernel cleanups: - Remove the H2K_vmblock_t::main_context field and all its bookkeeping. Main and worker threads now share the H2K_thread_stop path uniformly; VM-wide teardown is driven solely by H2K_vm_stop (the exiting flag), so tracking which context was "main" is no longer needed. Drops the field from vmblock.h and its assignments in setup/create/stop. - Factor the duplicated per-context teardown tail (asid_dec -> context_clear -> push free list -> num_cpus--) into a shared H2K_free_context_locked() exported from stop.h. Collapses the four open-coded copies in reap_one_locked, vm_stop_locked, H2K_thread_stop, and resched's self-reap onto it. - Fold the exiting-VM self-reap in resched() into the normal if (me != NULL) path instead of a separate early-return branch. Removes the self_reap_locked helper along with its duplicated runlist_remove/dosched and the unreachable tail, leaving a single exit point. On a reap, me is set to NULL so the shared dosched(me) tail won't save the dying context. No behavior change. Verified: clean build + full v81/opt test suite (136 passed, 0 failed). Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

Erich Plondke (eplondke) · 2026-06-26T02:39:15Z

The tests here are really awesome. They found some great problems.

stzahi1 reviewed Jun 1, 2026

View reviewed changes

stzahi1 requested changes Jun 1, 2026

View reviewed changes

andreykarpenko-qc added 6 commits June 11, 2026 00:43

event/intpool/test_h2: remove dead code

6ac16fe

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>

andreykarpenko-qc force-pushed the kernel_thread_stop_reap_blocked branch from 3c19e13 to 6ac16fe Compare June 11, 2026 09:09

andreykarpenko-qc requested a review from Bryan Bayerdorffer (bryanb-h2) June 11, 2026 09:11

andreykarpenko-qc marked this pull request as ready for review June 11, 2026 09:12

Erich Plondke (eplondke) requested changes Jun 24, 2026

View reviewed changes

github-actions Bot added the untested Mark untested PRs label Jun 25, 2026

andreykarpenko-qc force-pushed the kernel_thread_stop_reap_blocked branch from 9f4f231 to 6653e29 Compare June 25, 2026 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kernel thread stop reap blocked#39

Kernel thread stop reap blocked#39
andreykarpenko-qc wants to merge 7 commits into
masterfrom
kernel_thread_stop_reap_blocked

andreykarpenko-qc commented May 31, 2026

Uh oh!

stzahi1 Jun 1, 2026

Uh oh!

andreykarpenko-qc Jun 11, 2026

Uh oh!

Erich Plondke (eplondke) left a comment

Uh oh!

Erich Plondke (eplondke) Jun 24, 2026

Uh oh!

Erich Plondke (eplondke) Jun 24, 2026

Uh oh!

Erich Plondke (eplondke) Jun 24, 2026

Uh oh!

Erich Plondke (eplondke) Jun 24, 2026

Uh oh!

Bryan Bayerdorffer (bryanb-h2) Jun 25, 2026

Uh oh!

Erich Plondke (eplondke) Jun 24, 2026

Uh oh!

Bryan Bayerdorffer (bryanb-h2) Jun 25, 2026

Uh oh!

Bryan Bayerdorffer (bryanb-h2) commented Jun 25, 2026

Uh oh!

Erich Plondke (eplondke) commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

andreykarpenko-qc commented May 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Erich Plondke (eplondke) left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bryan Bayerdorffer (bryanb-h2) commented Jun 25, 2026

Uh oh!

Erich Plondke (eplondke) commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants