[Support] Join threads when stopping the ThreadPoolExecutor #166054

momo5502 · 2025-11-02T11:10:12Z

When building LLVM as DLL, the ThreadPoolExecutor can cause deadlocks on Windows.

The threads were previously only joined when the destructor of the ThreadPoolExecutor is called, not when it's stopped. Destruction happens when unloading the DLL, when the global destructors are called.

On Windows, std::thread uses FreeLibraryAndExitThread to cleanup a thread. This requires access to the loader lock, when a thread terminates.
However, when destroying the pool, the loader lock is also held, as the DLL is being unloaded.
If the threads did not end fast enough, the destructor would wait for them to join. At the same time, the threads would wait for the destructor to release the loader lock.

Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling llvm_shutdown, outside the loader lock.

llvmbot · 2025-11-02T11:10:42Z

@llvm/pr-subscribers-llvm-support

Author: Maurice Heumann (momo5502)

Changes

When building LLVM as DLL, the ThreadPoolExecutor can cause deadlocks on Windows.

The threads wre previously only joined when the destructor of the ThreadPoolExecutor is called. This happens when unloading the DLL, when the global destructors are called.

On Windows, std::threads use FreeLibraryAndExitThread to cleanup a thread. This requires access to the loader lock, when a thread terminates.
However, when destroying the pool, the loader lock is also held, as the DLL is being unloaded.
If the threads did not end fast enough, the destructor would wait for them to join. At the same time, the threads would wait for the destructor to release the loader lock.

Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling llvm_shutdown, outside the loader lock.

Full diff: https://github.com/llvm/llvm-project/pull/166054.diff

1 Files Affected:

(modified) llvm/lib/Support/Parallel.cpp (+7-5)

diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp
index 8e0c724accb36..31b51536aaa9e 100644
--- a/llvm/lib/Support/Parallel.cpp
+++ b/llvm/lib/Support/Parallel.cpp
@@ -82,26 +82,28 @@ class ThreadPoolExecutor : public Executor {
   ThreadPoolExecutor() = delete;
 
   void stop() {
+    std::vector<std::thread> ThreadsToJoin;
+
     {
       std::lock_guard<std::mutex> Lock(Mutex);
-      if (Stop)
+      if (Stop && Threads.empty())
         return;
       Stop = true;
+      ThreadsToJoin.swap(Threads);
     }
     Cond.notify_all();
     ThreadsCreated.get_future().wait();
-  }
 
-  ~ThreadPoolExecutor() override {
-    stop();
     std::thread::id CurrentThreadId = std::this_thread::get_id();
-    for (std::thread &T : Threads)
+    for (std::thread &T : ThreadsToJoin)
       if (T.get_id() == CurrentThreadId)
         T.detach();
       else
         T.join();
   }
 
+  ~ThreadPoolExecutor() override { stop(); }
+
   struct Creator {
     static void *call() { return new ThreadPoolExecutor(strategy); }
   };

aganea · 2025-11-02T22:36:12Z

Are you able to craft a unit test that exposes the issue?

compnerd · 2025-11-03T00:47:19Z

I'm not sure I follow how this fixes anything. If the stop method is not invoked, would the behaviour not be identical to the previous state? If so, how does this fix that?

momo5502 · 2025-11-03T06:09:37Z

Are you able to craft a unit test that exposes the issue?

I think that should be doable, yes. I can try :D

momo5502 · 2025-11-03T06:13:00Z

I'm not sure I follow how this fixes anything. If the stop method is not invoked, would the behaviour not be identical to the previous state? If so, how does this fix that?

You're right, this doesn't fix anything if the stop method is not invoked. This fix obviously requires the thread pool to be stopped outside the loader lock. llvm_shutdown does that, as the managed static construct invokes stop. So it requires calling llvm_shutdown. Form what I've seen, this is usually the case, as InitLLVM wraps that.

nga888 · 2025-11-03T12:24:32Z

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

nga888 · 2025-11-03T17:22:13Z

It's not ideal, but perhaps how the threads are joined should be dependent on whether the code is being built as a DLL? That way the performance benefit I mention above would still be available for those who build LLVM statically on Windows.

momo5502 · 2025-11-03T17:39:37Z

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

I understand. However, I would argue that this is dangerous. On Windows, the OS kills all threads before calling the destructors of the main exe. So in case any thread held a lock, this might cause deadlocks as well, just like in the DLL case here.
The safest thing to do would always be to join the threads before calling the destructors.

momo5502 · 2025-11-03T17:43:55Z

It's not ideal, but perhaps how the threads are joined should be dependent on whether the code is being built as a DLL? That way the performance benefit I mention above would still be available for those who build LLVM statically on Windows.

Interesting idea, however, as mentioned previously, joining threads in the destructor is dangerous either way.
Maybe, at the very least, it might be possible to just expose a function that joins the threads. This can then be called manually, after llvm_shutdown has been called. Every consumer of LLVM can decide whether they want to call that or not.

However, I'll try to fix the tests first, because there is obviously something I broke, then I'll try to create a unit test that reproduces the issue.

momo5502 · 2025-11-04T12:42:54Z

Are you able to craft a unit test that exposes the issue?

Turns out that creating a unittest is not really feasible.

Reproducing this issue requires to keep the threads alive after the thread pool has been stopped and waiting until the global destructors are being called. doing that indeed reliably reproduces the issue. however, this fix wants to join the thread beforehand, which won't work.

Instead of explicitly waiting for the destructors, it would be possible to just sleep for a long time, but that also doesn't feel right, as this just adds to the CI times.

momo5502 · 2025-11-04T12:51:09Z

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

So the fix works now, but as it joins the threads, shutdown will take longer.
I don't really see an easy way of having both the speed and safety. Except something like triggering the stop at the beginning of llvm_shutdown and joining the threads at the end, when everything inbetween is done. Not sure if that helps much and it's probably too big of a change.

If the speed is desired, I propose to expose some kind of function that, when called, tells the pool to join the threads when being stopped. Otherwise joining will be done in the destructor. That way, if one needs the safety, the function can be called.
That would be fine for me.

llvmbot added the llvm:support label Nov 2, 2025

momo5502 changed the title ~~[Support] Join threads when stopping ThreadPoolExecutor~~ [Support] Join threads when stopping the ThreadPoolExecutor Nov 2, 2025

momo5502 force-pushed the main branch 3 times, most recently from a3a95e6 to 8a3f2ef Compare November 2, 2025 20:15

aganea requested a review from compnerd November 2, 2025 22:36

momo5502 marked this pull request as draft November 3, 2025 06:08

momo5502 force-pushed the main branch from 8a3f2ef to 363f3b5 Compare November 3, 2025 18:59

[Support] Join threads when stopping ThreadPoolExecutor

9b807f3

momo5502 force-pushed the main branch from 363f3b5 to 9b807f3 Compare November 3, 2025 19:10

momo5502 marked this pull request as ready for review November 4, 2025 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Support] Join threads when stopping the ThreadPoolExecutor #166054

[Support] Join threads when stopping the ThreadPoolExecutor #166054

momo5502 commented Nov 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 2, 2025

Uh oh!

aganea commented Nov 2, 2025

Uh oh!

compnerd commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025 •

edited

Loading

Uh oh!

nga888 commented Nov 3, 2025

Uh oh!

nga888 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025 •

edited

Loading

Uh oh!

momo5502 commented Nov 4, 2025 •

edited

Loading

Uh oh!

momo5502 commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Support] Join threads when stopping the ThreadPoolExecutor #166054

Are you sure you want to change the base?

[Support] Join threads when stopping the ThreadPoolExecutor #166054

Conversation

momo5502 commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 2, 2025

Uh oh!

aganea commented Nov 2, 2025

Uh oh!

compnerd commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nga888 commented Nov 3, 2025

Uh oh!

nga888 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025

Uh oh!

momo5502 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

momo5502 commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

momo5502 commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

momo5502 commented Nov 2, 2025 •

edited

Loading

momo5502 commented Nov 3, 2025 •

edited

Loading

momo5502 commented Nov 3, 2025 •

edited

Loading

momo5502 commented Nov 4, 2025 •

edited

Loading