Skip to content

Conversation

@felixweilbach
Copy link
Contributor

Summary:
ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0); before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Differential Revision: D89889628

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16416

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 15bb16a with merge base dbf3c37 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 30, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 30, 2025

@felixweilbach has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89889628.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

felixweilbach added a commit to felixweilbach/executorch that referenced this pull request Dec 30, 2025
Summary:

ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling `executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0);` before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Differential Revision: D89889628
Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

felixweilbach added a commit to felixweilbach/executorch that referenced this pull request Jan 5, 2026
Summary:

ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling `executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0);` before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Reviewed By: kimishpatel

Differential Revision: D89889628
felixweilbach added a commit to felixweilbach/executorch that referenced this pull request Jan 13, 2026
Summary:

ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling `executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0);` before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Reviewed By: kimishpatel

Differential Revision: D89889628
Summary:

ThreadPool gets stored in a static variable here extension/threadpool/threadpool.cpp:146

This means the destructor of ThreadPool will be run when the process exits or a DLL containing this code unloads.

While working with ExecuTorch I experienced a deadlock during unloading our DLL (which contained ExecuTorch) at runtime. This was caused by the pthreadpool_destroy function pthreadpool/src/windows.c:366 waiting forever on the worker threads.

Why this is happening exactly is unclear to me. It is likely a race condition inside Windows Parallel Loader (https://blogs.blackberry.com/en/2017/10/windows-10-parallel-loading-breakdown) as I could see its functions in the stack trace of the stuck worker threads after they returned from their main function.

The issue was mitigated on my side by calling `executorch::extension::threadpool::get_threadpool()->_unsafe_reset_threadpool(0);` before unloading the DLL.

This is just a workaround. I think a proper fix would be to rework the ThreadPool singleton and allow for explicit termination of it.

Reviewed By: kimishpatel

Differential Revision: D89889628
@meta-codesync meta-codesync bot merged commit 2c59f85 into pytorch:main Jan 14, 2026
142 of 144 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants