-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[Support] Join threads when stopping the ThreadPoolExecutor #166054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-support Author: Maurice Heumann (momo5502) ChangesWhen building LLVM as DLL, the The threads wre previously only joined when the destructor of the On Windows, Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling Full diff: https://github.com/llvm/llvm-project/pull/166054.diff 1 Files Affected:
diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp
index 8e0c724accb36..31b51536aaa9e 100644
--- a/llvm/lib/Support/Parallel.cpp
+++ b/llvm/lib/Support/Parallel.cpp
@@ -82,26 +82,28 @@ class ThreadPoolExecutor : public Executor {
ThreadPoolExecutor() = delete;
void stop() {
+ std::vector<std::thread> ThreadsToJoin;
+
{
std::lock_guard<std::mutex> Lock(Mutex);
- if (Stop)
+ if (Stop && Threads.empty())
return;
Stop = true;
+ ThreadsToJoin.swap(Threads);
}
Cond.notify_all();
ThreadsCreated.get_future().wait();
- }
- ~ThreadPoolExecutor() override {
- stop();
std::thread::id CurrentThreadId = std::this_thread::get_id();
- for (std::thread &T : Threads)
+ for (std::thread &T : ThreadsToJoin)
if (T.get_id() == CurrentThreadId)
T.detach();
else
T.join();
}
+ ~ThreadPoolExecutor() override { stop(); }
+
struct Creator {
static void *call() { return new ThreadPoolExecutor(strategy); }
};
|
a3a95e6 to
8a3f2ef
Compare
|
Are you able to craft a unit test that exposes the issue? |
|
I'm not sure I follow how this fixes anything. If the |
I think that should be doable, yes. I can try :D |
You're right, this doesn't fix anything if the stop method is not invoked. This fix obviously requires the thread pool to be stopped outside the loader lock. llvm_shutdown does that, as the managed static construct invokes stop. So it requires calling llvm_shutdown. Form what I've seen, this is usually the case, as InitLLVM wraps that. |
|
Just want to note that IIRC, the reason for only joining the threads in the destructor was to speed up "fast" no cleanup exit, i.e. would skip joining any threads. This used to be quite an overhead for systems with very high thread counts. Something to bear in mind. |
|
It's not ideal, but perhaps how the threads are joined should be dependent on whether the code is being built as a DLL? That way the performance benefit I mention above would still be available for those who build LLVM statically on Windows. |
I understand. However, I would argue that this is dangerous. On Windows, the OS kills all threads before calling the destructors of the main exe. So in case any thread held a lock, this might cause deadlocks as well, just like in the DLL case here. |
Interesting idea, however, as mentioned previously, joining threads in the destructor is dangerous either way. However, I'll try to fix the tests first, because there is obviously something I broke, then I'll try to create a unit test that reproduces the issue. |
Turns out that creating a unittest is not really feasible. Reproducing this issue requires to keep the threads alive after the thread pool has been stopped and waiting until the global destructors are being called. doing that indeed reliably reproduces the issue. however, this fix wants to join the thread beforehand, which won't work. Instead of explicitly waiting for the destructors, it would be possible to just sleep for a long time, but that also doesn't feel right, as this just adds to the CI times. |
So the fix works now, but as it joins the threads, shutdown will take longer. If the speed is desired, I propose to expose some kind of function that, when called, tells the pool to join the threads when being stopped. Otherwise joining will be done in the destructor. That way, if one needs the safety, the function can be called. |
When building LLVM as DLL, the
ThreadPoolExecutorcan cause deadlocks on Windows.The threads were previously only joined when the destructor of the
ThreadPoolExecutoris called, not when it's stopped. Destruction happens when unloading the DLL, when the global destructors are called.On Windows,
std::threadusesFreeLibraryAndExitThreadto cleanup a thread. This requires access to the loader lock, when a thread terminates.However, when destroying the pool, the loader lock is also held, as the DLL is being unloaded.
If the threads did not end fast enough, the destructor would wait for them to join. At the same time, the threads would wait for the destructor to release the loader lock.
Joining the threads when stopping the pool fixes that, as it ensures the threads are stopped when calling
llvm_shutdown, outside the loader lock.