Skip to content

Conversation

@momo5502
Copy link
Contributor

@momo5502 momo5502 commented Nov 2, 2025

When building LLVM as DLL, the ThreadPoolExecutor can cause deadlocks on Windows.

The threads were previously only joined when the destructor of the ThreadPoolExecutor is called, not when it's stopped. Destruction happens when unloading the DLL, when the global destructors are called.

On Windows, std::thread uses FreeLibraryAndExitThread to cleanup a thread. This requires access to the loader lock, when a thread terminates.
However, when destroying the pool, the loader lock is also held, as the DLL is being unloaded.
If the threads did not end fast enough, the destructor would wait for them to join. At the same time, the threads would wait for the destructor to release the loader lock.

Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling llvm_shutdown, outside the loader lock.

@momo5502 momo5502 changed the title [Support] Join threads when stopping ThreadPoolExecutor [Support] Join threads when stopping the ThreadPoolExecutor Nov 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2025

@llvm/pr-subscribers-llvm-support

Author: Maurice Heumann (momo5502)

Changes

When building LLVM as DLL, the ThreadPoolExecutor can cause deadlocks on Windows.

The threads wre previously only joined when the destructor of the ThreadPoolExecutor is called. This happens when unloading the DLL, when the global destructors are called.

On Windows, std::threads use FreeLibraryAndExitThread to cleanup a thread. This requires access to the loader lock, when a thread terminates.
However, when destroying the pool, the loader lock is also held, as the DLL is being unloaded.
If the threads did not end fast enough, the destructor would wait for them to join. At the same time, the threads would wait for the destructor to release the loader lock.

Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling llvm_shutdown, outside the loader lock.


Full diff: https://github.com/llvm/llvm-project/pull/166054.diff

1 Files Affected:

  • (modified) llvm/lib/Support/Parallel.cpp (+7-5)
diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp
index 8e0c724accb36..31b51536aaa9e 100644
--- a/llvm/lib/Support/Parallel.cpp
+++ b/llvm/lib/Support/Parallel.cpp
@@ -82,26 +82,28 @@ class ThreadPoolExecutor : public Executor {
   ThreadPoolExecutor() = delete;
 
   void stop() {
+    std::vector<std::thread> ThreadsToJoin;
+
     {
       std::lock_guard<std::mutex> Lock(Mutex);
-      if (Stop)
+      if (Stop && Threads.empty())
         return;
       Stop = true;
+      ThreadsToJoin.swap(Threads);
     }
     Cond.notify_all();
     ThreadsCreated.get_future().wait();
-  }
 
-  ~ThreadPoolExecutor() override {
-    stop();
     std::thread::id CurrentThreadId = std::this_thread::get_id();
-    for (std::thread &T : Threads)
+    for (std::thread &T : ThreadsToJoin)
       if (T.get_id() == CurrentThreadId)
         T.detach();
       else
         T.join();
   }
 
+  ~ThreadPoolExecutor() override { stop(); }
+
   struct Creator {
     static void *call() { return new ThreadPoolExecutor(strategy); }
   };

@momo5502 momo5502 force-pushed the main branch 3 times, most recently from a3a95e6 to 8a3f2ef Compare November 2, 2025 20:15
@aganea
Copy link
Member

aganea commented Nov 2, 2025

Are you able to craft a unit test that exposes the issue?

@aganea aganea requested a review from compnerd November 2, 2025 22:36
@compnerd
Copy link
Member

compnerd commented Nov 3, 2025

I'm not sure I follow how this fixes anything. If the stop method is not invoked, would the behaviour not be identical to the previous state? If so, how does this fix that?

@momo5502 momo5502 marked this pull request as draft November 3, 2025 06:08
@momo5502
Copy link
Contributor Author

momo5502 commented Nov 3, 2025

Are you able to craft a unit test that exposes the issue?

I think that should be doable, yes. I can try :D

@momo5502
Copy link
Contributor Author

momo5502 commented Nov 3, 2025

I'm not sure I follow how this fixes anything. If the stop method is not invoked, would the behaviour not be identical to the previous state? If so, how does this fix that?

You're right, this doesn't fix anything if the stop method is not invoked. This fix obviously requires the thread pool to be stopped outside the loader lock. llvm_shutdown does that, as the managed static construct invokes stop. So it requires calling llvm_shutdown. Form what I've seen, this is usually the case, as InitLLVM wraps that.

@nga888
Copy link
Collaborator

nga888 commented Nov 3, 2025

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

@nga888
Copy link
Collaborator

nga888 commented Nov 3, 2025

It's not ideal, but perhaps how the threads are joined should be dependent on whether the code is being built as a DLL? That way the performance benefit I mention above would still be available for those who build LLVM statically on Windows.

@momo5502
Copy link
Contributor Author

momo5502 commented Nov 3, 2025

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

I understand. However, I would argue that this is dangerous. On Windows, the OS kills all threads before calling the destructors of the main exe. So in case any thread held a lock, this might cause deadlocks as well, just like in the DLL case here.
The safest thing to do would always be to join the threads before calling the destructors.

@momo5502
Copy link
Contributor Author

momo5502 commented Nov 3, 2025

It's not ideal, but perhaps how the threads are joined should be dependent on whether the code is being built as a DLL? That way the performance benefit I mention above would still be available for those who build LLVM statically on Windows.

Interesting idea, however, as mentioned previously, joining threads in the destructor is dangerous either way.
Maybe, at the very least, it might be possible to just expose a function that joins the threads. This can then be called manually, after llvm_shutdown has been called. Every consumer of LLVM can decide whether they want to call that or not.

However, I'll try to fix the tests first, because there is obviously something I broke, then I'll try to create a unit test that reproduces the issue.

@momo5502
Copy link
Contributor Author

momo5502 commented Nov 4, 2025

Are you able to craft a unit test that exposes the issue?

Turns out that creating a unittest is not really feasible.

Reproducing this issue requires to keep the threads alive after the thread pool has been stopped and waiting until the global destructors are being called. doing that indeed reliably reproduces the issue. however, this fix wants to join the thread beforehand, which won't work.

Instead of explicitly waiting for the destructors, it would be possible to just sleep for a long time, but that also doesn't feel right, as this just adds to the CI times.

@momo5502 momo5502 marked this pull request as ready for review November 4, 2025 12:42
@momo5502
Copy link
Contributor Author

momo5502 commented Nov 4, 2025

Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind.

So the fix works now, but as it joins the threads, shutdown will take longer.
I don't really see an easy way of having both the speed and safety. Except something like triggering the stop at the beginning of llvm_shutdown and joining the threads at the end, when everything inbetween is done. Not sure if that helps much and it's probably too big of a change.

If the speed is desired, I propose to expose some kind of function that, when called, tells the pool to join the threads when being stopped. Otherwise joining will be done in the destructor. That way, if one needs the safety, the function can be called.
That would be fine for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants