Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix capture deadlock when vkQueuePresentKHR unlocks a CPU wait #1708

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

marius-pelegrin-arm
Copy link
Contributor

This bug can be observed when capturing the Khronos Vulkan-Sample "timeline_semaphore"

In a situation where you have two threads:

  • one waiting on a synchronization primitive (VkSemaphore, VkFence....)
  • another on which a call to vkQueuePresentKHR is necessary to signal that primitive

Then if the "CPU waiting call" (vkWaitForFences, vkWaitSemaphores...) is reached by thread 1 before vkQueuePresentKHR is reached by thread 2, the api_call_mutex_ of the CaptureManager is locked by the CPU waiting call which makes it impossible to lock by the call to vkQueuePresentKHR (which is always an exclusive lock) and we end up in a deadlock situation.

I don't see why the lock inside vkQueuePresentKHR should be exclusive, so I propose to solve this issue by simply using the same system of shared lock as for all the other calls.

In a situation where you have two threads:
- one waiting on a synchronization primitive (`VkSemaphore`, `VkFence`....)
- another on which a call to `vkQueuePresentKHR` is necessary to signal that primitive

Then if the "CPU waiting call" (`vkWaitForFences`, `vkWaitSemaphores`...) is reached by
thread 1 before `vkQueuePresentKHR` is reached by thread 2, the `api_call_mutex_` of the
`CaptureManager` is locked by the CPU waiting call which makes it impossible to lock
by the call to `vkQueuePresentKHR` (which is always an exclusive lock) and we end up in
a deadlock situation.

I don't see why the lock inside `vkQueuePresentKHR` should be exclusive, so I propose to
solve this issue by simply using the same system of shared lock as for all the other
calls.

Change-Id: I3b8e9e9c569b578564fa3529d42800b586f54191
@ci-tester-lunarg
Copy link

CI gfxreconstruct build queued with queue ID 249192.

@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 4760 passed.

1 similar comment
@ci-tester-lunarg
Copy link

CI gfxreconstruct build # 4760 passed.

@dgraves
Copy link
Contributor

dgraves commented Sep 4, 2024

An additional change would be needed here to release the shared lock and acquire the exclusive lock when trimming is activated by a present call. Recently, I was looking at the exclusive locks in present due to a noticeable perf impact they have on some DX12 applications.

An earlier version of the code used a shared lock in present and then acquired the exclusive lock when activating trimming. The exclusive locks were moved to the present calls by 6bf1a6a and 500bb7c to avoid a deadlock issue, which looks to have been caused by two threads acquiring the shared lock and then attempting to acquire the exclusive lock without releasing the shared lock.

I have a local change to acquire the shared lock on present and then release it before acquiring the exclusive lock when activating trimming, which has been going through testing and should be ready for submission soon. This change wouldn't necessarily replace what you have here but could be added on top of it.

@bradgrantham-lunarg bradgrantham-lunarg added the P1 Prevents an important capture from being replayed label Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Prevents an important capture from being replayed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants