Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webgpu: vkAllocateMemory returned error value VK_ERROR_INVALID_EXTERNAL_HANDLE #1743

Open
jeremyg-lunarg opened this issue Sep 16, 2024 · 2 comments
Labels
P2 A high-priority code maintenance issue or a functional problem that is recoverable or not a crash. vulkan

Comments

@jeremyg-lunarg
Copy link

Describe the replay bug:
This is a replay of webgpu content running on linux with the RADV driver. WebGPU renders into a swapchain image created by chromium. It is then passed to a compositor in chromium. Both components are Vulkan running with their own VkInstance and VkDevice.

[gfxrecon] FATAL - API call at index: 4261 thread: 1 vkAllocateMemory returned error value VK_ERROR_INVALID_EXTERNAL_HANDLE that does not match the result from the capture file: VK_SUCCESS. Replay cannot continue.
Replay has encountered a fatal error and cannot continue: an external handle is not a valid handle of the specified type

It looks like at some point vkGetMemoryFdKHR() returns a -1 file descriptor but I could be getting confused looking at the output.

note: chromium might be doing graphics stuff in multiple processes, looking at the gfxr-convert output I think everything is in 1 process and captured but I'm not 100% sure.

Verify before submission:

  • Was trimming enabled? no
  • Was replayer renamed if necessary? no
  • Was --sync used if title is known to need forced synchronization? no

Build Environment:
Please include the SHA and PR or branch name used in capture and also used to build the replayer.

1.3.290 SDK

To Reproduce
Steps to reproduce the behavior:

  1. Get the .gfxr file attached to the issue.
  2. . Run gfxrecon-replay with gfxrecon_capture_frames_1_through_5000_20240916T105521.gfxr, no arguments

Screenshots:
Does not run long enough for screenshots.

System environment:
Capture and replay on the same system running Ubuntu 24.04 with the RADV driver

Title configuration:
life branch of https://github.com/jeremyg-lunarg/webgpu-electron

With npm and node.js installed:
npm install
npm run start

Additional information (optional):

  • is there a SHA for which replayer is known to replay correctly? no
  • Is there an older trace that works? What SHA was used to build those capture DLLs? no
  • Does a newer capture work? no
  • Does the capture file replay correctly on a different GPU? no
@jeremyg-lunarg
Copy link
Author

@jeremyg-lunarg
Copy link
Author

I think I see what is happening. Here's the memory export to fd 75:

{
  "index": 4255,
  "function": {
    "name": "vkGetMemoryFdKHR",
    "thread": 1,
    "return": "VK_SUCCESS",
    "args": {
      "device": 7,
      "pGetFdInfo": {
        "sType": "VK_STRUCTURE_TYPE_MEMORY_GET_FD_INFO_KHR",
        "memory": 585,
        "handleType": "VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT",
        "pNext": null
      },
      "pFd": 75
    }
  }
},

Then the import happens to fd 76:

{
  "index": 4261,
  "function": {
    "name": "vkAllocateMemory",
    "thread": 1,
    "return": "VK_SUCCESS",
    "args": {
      "device": 38,
      "pAllocateInfo": {
        "sType": "VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO",
        "allocationSize": 1048576,
        "memoryTypeIndex": 0,
        "pNext": {
          "sType": "VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR",
          "handleType": "VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT",
          "fd": 76,
          "pNext": {
            "sType": "VK_STRUCTURE_TYPE_MEMORY_DEDICATED_ALLOCATE_INFO",
            "image": 586,
            "buffer": 0,
            "pNext": null
          }
        }
      },
      "pAllocator": null,
      "pMemory": 587
    }
  }
},

I think the problem is in the replay where fd 75 is valid but fd 76 is not. The chromium code includes this:

descriptor.memoryFD = dup(memory_fd_.get());

That is most likely what makes fd 76 point to the same external memory as fd 75.

I'm guessing that gfxreconstruct doesn't call dup() and it doesn't really need to unless it wants to keep some control of the fd (which chromium apparently does). So it seems like getting this application to work would require recording dup() and probably some other system calls to know when this is happening.

@beau-lunarg beau-lunarg added the P2 A high-priority code maintenance issue or a functional problem that is recoverable or not a crash. label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 A high-priority code maintenance issue or a functional problem that is recoverable or not a crash. vulkan
Projects
None yet
Development

No branches or pull requests

3 participants