Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor Hangs when changing "content" script (on Linux). #855

Open
ricejasonf opened this issue Jun 2, 2024 · 11 comments
Open

Editor Hangs when changing "content" script (on Linux). #855

ricejasonf opened this issue Jun 2, 2024 · 11 comments

Comments

@ricejasonf
Copy link

ricejasonf commented Jun 2, 2024

Hi, I am not certain that this is related to linux specifically, but when I load different "content" scripts in the editor sometimes the application hangs and sometimes it won't even respond to signals. (ie I have to kill -9 the process.). I tried it in debug mode and found the problem point.

7238     // Initiate stalling CPU when GPU is not yet finished with next frame:
7239     if (FRAMECOUNT >= BUFFERCOUNT)
7240     {
7241       const uint32_t bufferindex = GetBufferIndex();
7242       for (int queue = 0; queue < QUEUE_COUNT; ++queue)
7243       {
7244         if (frame_fence[bufferindex][queue] == VK_NULL_HANDLE)
7245           continue;
7246 
7247         res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue], VK_TRUE, 0xFFFFFFFFFFFFFFFF);
7248         assert(res == VK_SUCCESS);
7249 
7250         res = vkResetFences(device, 1, &frame_fence[bufferindex][queue]);
7251         assert(res == VK_SUCCESS);
7252       }
7253     }

The call to vkWaitForFences hangs. I am new to this api (and modern graphics in general), but I see that the timeout is very large. Is this the right way to handle "CPU stalling"? I think at least this could loop on VK_TIMEOUT and use a reasonably small timeout (from what I have been googling). Also , here is the call stack from when I was able to stop the process:

* thread #1, name = 'WickedEngineEdi', stop reason = signal SIGSTOP
  * frame #0: 0x00007ffff791d9ed libc.so.6`__poll + 77
    frame #1: 0x00007fffda007cc3 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol36082 + 147
    frame #2: 0x00007fffda422f59 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44349 + 73
    frame #3: 0x00007fffda407950 libnvidia-glcore.so.550.78`___lldb_unnamed_symbol44160 + 672
    frame #4: 0x00007fffda3239ae libnvidia-glcore.so.550.78`___lldb_unnamed_symbol42754 + 30
    frame #5: 0x0000555555c8668b WickedEngineEditor`wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(this=0x000055555705a380) at wiGraphicsDevice_Vulkan.cpp:7247:26
    frame #6: 0x0000555555babf01 WickedEngineEditor`wi::Application::Run(this=0x00007fffff8d4990) at wiApplication.cpp:252:37
    frame #7: 0x00005555555b4661 WickedEngineEditor`sdl_loop(editor=0x00007fffff8d4990) at main_SDL2.cpp:16:19
    frame #8: 0x00005555555b4ce0 WickedEngineEditor`main(argc=1, argv=0x00007fffffffe818) at main_SDL2.cpp:162:23
    frame #9: 0x00007ffff7841d4a libc.so.6`___lldb_unnamed_symbol3264 + 122
    frame #10: 0x00007ffff7841e0c libc.so.6`__libc_start_main + 140
    frame #11: 0x00005555555b4285 WickedEngineEditor`_start + 37

I will play with this more next week, but I thought I would wait for some feedback on the intent with the large timeout.

Thanks.

EDIT: It occurred to me that maybe it is stuck in some loop and it just happens to always break while the process is waiting on that line (7247).

@turanszkij
Copy link
Owner

Hi, there is the "infinite" timeout for a purpose, it would be invalid to go further while the GPU is not finished with that frame which we are waiting on. Could you make sure that you have updated graphics drivers?

@ricejasonf
Copy link
Author

I did a full update and verified I have the latest driver, and I was able to get to freeze again immediately (loading scripts under "Content").

local/nvidia 550.78-7
    NVIDIA drivers for linux

https://archlinux.org/packages/extra/x86_64/nvidia/

@brakhane
Copy link
Collaborator

brakhane commented Jun 3, 2024

@ricejasonf Wicked recently updated the dxcompiler to the May version, and that seems to be broken on Linux (#856) and caused all kinds of weird issues on various graphics drivers. It has been reverted to the previous version, can you update to master and give it another try?

@ricejasonf
Copy link
Author

Sorry, but the problem still persists. It does not happen every time, but it still definitely freezes when loading a script.

@brakhane
Copy link
Collaborator

brakhane commented Jun 3, 2024

Did you delete the shaders/spirv directory just to make sure no compiled shaders from the dxcompiler remain?

@ricejasonf
Copy link
Author

I deleted the entire build directory. If that is where they are located, then yes. (I am on the Discord if that is easier for back and forth stuff.)

@ricejasonf
Copy link
Author

I can confirm that it is in fact getting stuck in that vkWaitForFences call. Consider the following small alteration to the point of interest:

7247         while (true) {
7248           res = vkWaitForFences(device, 1, &frame_fence[bufferindex][queue],
7249                                 VK_TRUE, uint64_t{10000000000});
7250           if (res == VK_SUCCESS) break;
7251           assert(res == VK_SUCCESS);
7252         }

Attempting to reproduce the error results in hitting the assert after 10 seconds of blank screen.

WickedEngineEditor: /home/jason/Projects/WickedEngine/WickedEngine/wiGraphicsDevice_Vulkan.cpp:7251: virtual void wi::graphics::GraphicsDevice_Vulkan::SubmitCommandLists(): Assertion `res == VK_SUCCESS' failed.
Aborted (core dumped)

It would be nice to find the bug, but I think there is also an opportunity for graceful error handling here.

@ricejasonf
Copy link
Author

I realized that this is a duplicate of #804.

@brakhane
Copy link
Collaborator

brakhane commented Jun 4, 2024

Can you confirm that the hang always happens when queue is 3 (QUEUE_VIDEO_DECODE)? And never with any other value?

@ricejasonf
Copy link
Author

I tried it several times and the value for queue was consistently 3. So, yes, that looks like the enum value for QUEUE_VIDEO_DECODE as you stated.

@ricejasonf
Copy link
Author

When resizing the widget window for the entity component system, I can reproduce this very quickly just wagging it back and forth. Still always queue == 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants