-
Notifications
You must be signed in to change notification settings - Fork 63
Shader build rework #1586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shader build rework #1586
Conversation
49956ed
to
a13896f
Compare
I rebased on #1570 on my end to test with various drivers and machines, and I got this with some configuration: /vid_restart
double free or corruption (out)
Thread 1 "daemon" received signal SIGABRT, Aborted.
Thread 1 (Thread 0x7f5ce83c1b00 (LWP 21747) "daemon"):
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007f5cf444527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007f5cf44288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007f5cf44297b6 in __libc_message_impl (fmt=fmt@entry=0x7f5cf45ce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:134
#6 0x00007f5cf44a8ff5 in malloc_printerr (str=str@entry=0x7f5cf45d1ac0 "double free or corruption (out)") at ./malloc/malloc.c:5772
#7 0x00007f5cf44ab120 in _int_free_merge_chunk (av=0x7f5cf4603ac0 <main_arena>, p=0x7f5cbf7fd840, size=140037160078912) at ./malloc/malloc.c:4676
#8 0x00007f5cf44ab43a in _int_free (av=0x7f5cf4603ac0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at ./malloc/malloc.c:4646
#9 0x00007f5cf44addae in __GI___libc_free (mem=0x7f5cbf7fd850) at ./malloc/malloc.c:3398
#10 0x00005dadd58294b9 in Z_Free (ptr=<optimized out>, ptr=<optimized out>) at /src/engine/qcommon/qcommon.h:582
#11 GLShaderManager::FreeAll (this=<optimized out>) at /src/engine/renderer/gl_shader.cpp:250
#12 GLSL_ShutdownGPUShaders () at /src/engine/renderer/tr_shade.cpp:468
#13 0x00005dadd580d8ab in RE_Shutdown (destroyWindow=<optimized out>) at /src/engine/renderer/tr_init.cpp:1599
#14 0x00005dadd5764145 in CL_ShutdownRef () at /src/engine/client/cl_main.cpp:2275
#15 CL_Vid_Restart_f () at /src/engine/client/cl_main.cpp:1448
#16 0x00005dadd5875034 in Cmd::ExecuteCommand (command=..., parseCvars=false, env=0x0) at /src/engine/framework/CommandSystem.cpp:215
#17 0x00005dadd5875693 in Cmd::ExecuteCommandBuffer () at /usr/include/c++/13/bits/basic_string.h:1071
#18 0x00005dadd572964f in Com_Frame () at /src/engine/qcommon/common.cpp:919
#19 0x00005dadd572c4ad in Application::ClientApplication::Frame (this=0x5dadd61703c0 <Application::GetApp()::app>) at /src/engine/client/ClientApplication.cpp:110
#20 0x00005dadd571d4c6 in Application::Frame () at /src/engine/framework/Application.cpp:87
#21 main (argc=<optimized out>, argv=<optimized out>) at /src/engine/framework/System.cpp:1004 /devmap plat23
double free or corruption (out)
Thread 1 "daemon" received signal SIGABRT, Aborted.
Thread 1 (Thread 0x74cd75ad4b00 (LWP 17409) "daemon"):
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x000074cd81a4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x000074cd81a288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x000074cd81a297b6 in __libc_message_impl (fmt=fmt@entry=0x74cd81bce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:134
#6 0x000074cd81aa8ff5 in malloc_printerr (str=str@entry=0x74cd81bd1ac0 "double free or corruption (out)") at ./malloc/malloc.c:5772
#7 0x000074cd81aab120 in _int_free_merge_chunk (av=0x74cd81c03ac0 <main_arena>, p=0x74cd512b9840, size=128425940389440) at ./malloc/malloc.c:4676
#8 0x000074cd81aab43a in _int_free (av=0x74cd81c03ac0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at ./malloc/malloc.c:4646
#9 0x000074cd81aaddae in __GI___libc_free (mem=0x74cd512b9850) at ./malloc/malloc.c:3398
#10 0x0000572aafdbe4b9 in Z_Free (ptr=<optimized out>, ptr=<optimized out>) at /src/engine/qcommon/qcommon.h:582
#11 GLShaderManager::FreeAll (this=<optimized out>) at /src/engine/renderer/gl_shader.cpp:250
#12 GLSL_ShutdownGPUShaders () at /src/engine/renderer/tr_shade.cpp:468
#13 0x0000572aafd9ed05 in InitOpenGL () at /src/engine/renderer/tr_init.cpp:405
#14 0x0000572aafdaa051 in R_Init () at /src/engine/renderer/tr_init.cpp:1460
#15 RE_BeginRegistration (glconfigOut=0x572ab04fdc74 <cls+2802484>, glconfig2Out=0x572ab04fe8a8 <cls+2805608>) at /src/engine/renderer/tr_model.cpp:259
#16 0x0000572aafd053d5 in CL_InitRenderer () at /src/engine/client/cl_main.cpp:2104
#17 CL_StartHunkUsers () at /src/engine/client/cl_main.cpp:2176
#18 0x0000572aafcfdbea in CL_DownloadsComplete () at /src/engine/client/cl_download.cpp:105
#19 0x0000572aafd0b2c7 in CL_InitDownloads () at /src/engine/client/cl_download.cpp:237
#20 CL_ParseGamestate (msg=msg@entry=0x7ffcb27454f0) at /src/engine/client/cl_parse.cpp:460
#21 0x0000572aafd0b8a0 in CL_ParseServerMessage (msg=msg@entry=0x7ffcb27454f0) at /src/engine/client/cl_parse.cpp:570
#22 0x0000572aafd0ce16 in CL_PacketEvent (from=..., msg=0x7ffcb27454f0) at /src/engine/client/cl_main.cpp:1946
#23 0x0000572aafcbc1eb in Com_EventLoop () at /src/engine/qcommon/common.cpp:429
#24 0x0000572aafcbe6e5 in Com_Frame () at /src/engine/qcommon/common.cpp:948
#25 0x0000572aafcc14ad in Application::ClientApplication::Frame (this=0x572ab07053c0 <Application::GetApp()::app>) at /src/engine/client/ClientApplication.cpp:110
#26 0x0000572aafcb24c6 in Application::Frame () at /src/engine/framework/Application.cpp:87
#27 main (argc=<optimized out>, argv=<optimized out>) at /src/engine/framework/System.cpp:1004
|
Maybe just an incorrect rebase or something. Either way, we should deal with that after one of these pr's is merged, as they seem to work fine on their own. Any extra changes added with this pr's commits are out of scope of the pr. |
373a251
to
9775d3b
Compare
What's funny is that after the rebase it still works on machines where this non-rebased branch already work… I suspect that the bug is not reproducible on all drivers. For example if one driver zeroes allocated memory and another driver doesn't and at some point the allocated-but-unset memory is read, the resulting behavior would differ. Here that crashes when freeing some stuff so that would be something else, but maybe something similar related to freeing is happening. I doubt this is an incorrect rebase anyway, the code merged without anything conflict to solve, and #1570 doesn't touch that part of the code as far as I know. |
If I do this it doesn't crash: --- a/src/engine/qcommon/qcommon.h
+++ b/src/engine/qcommon/qcommon.h
@@ -579,6 +579,7 @@ inline ALLOCATOR char* CopyString(const char* str)
}
inline void Z_Free(void* ptr)
{
+ if (!ptr)
free(ptr);
} |
This branch can't work if |
Here is the proper fix: diff --git a/src/engine/renderer/gl_shader.cpp b/src/engine/renderer/gl_shader.cpp
index 88acd065e..bb808e7dd 100644
--- a/src/engine/renderer/gl_shader.cpp
+++ b/src/engine/renderer/gl_shader.cpp
@@ -288,6 +288,8 @@ void GLShaderManager::UpdateShaderProgramUniformLocations( GLShader* shader, Sha
uniform->UpdateShaderProgramUniformLocation( shaderProgram );
}
+ shaderProgram->uniformBlockIndexes = nullptr;
+
if( glConfig2.uniformBufferObjectAvailable ) {
// create buffer for storing uniform block indexes
shaderProgram->uniformBlockIndexes = ( GLuint* ) Z_Malloc( sizeof( GLuint ) * numUniformBlocks ); Otherwise it will crash when doing: if ( program.uniformBlockIndexes ) {
Z_Free( program.uniformBlockIndexes );
} When |
@VReaperV one can reproduce the bug on this branch by running the engine this way:
|
That doesn't really make sense? You're only doing |
Oh, it's because it's not set to |
Fixed now. |
I tested those drivers:
I tested switching presets, loading maps, doing Now that the null free is fixed, I don't see any issue anymore. |
Very good, thanks for testing that! |
More tests:
They all worked (some also require #1570 but it was included when testing). |
So it's now working well on a wide range hardware, with no issues. |
Yes, the only generation of PC hardware I haven't tested is the recent GSP-based Nvidia, I don't have one but I believe that's what you use. What I haven't tested is Arm hardware, but if some issue arises we can take care of that later. And give the diversity of tests I run I doubt this will happen anyway, especially since all the Arm hardware I can test uses Mesa, which was already tested in a dozen of different configurations with other hardware. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you very much for doing this. That's the kind of work I cannot do myself without doing some months of research, so it would have been very unlikely to see this being done if that was not done by you.
You're saving the world!
Also make sure to display the full shader name rather than just the main shader.
Previously, the GLSL shader building was done in the following manner: 1. When the renderer is started, some of the shader source was initialised/processed. This included the main shader text, #insert's, and material system post-processing. 2. When `buildAll()` was called, all permutations with empty deform shaders were built. The deform shaders themselves are loaded either at the start (for the default, empty one), or when the map itself is being loaded. These permutations included adding macros to shaders in the form of `#ifndef #define #endif`, and attaching the default deform shader. When possible, shader program binaries were saved/loaded to/from disk. 3. UI shader and non-0 deform shaders were built on-demand. There were multiple issues with that system: 1. Shader caches were unreliable. This is likely because the checksum was calculated *before* macros and such were added to the shader text. 2. Every shader permutation was always compiled and linked every time. 3. Shader programs with non-0 deformVertexes were never cached. This commit changes the underlying system to use an `std::vector` of shader program descriptors, and another one for shader descriptors. The initialisation/building of shaders is now more generic for each type, and as a result each unique shader is only compiled once. The permutations are still linked each time (unless downloaded from cache), but this change lays the foundation to easily add shader pipelines, which would solve this problem. Shader cache binaries are now identified by each shader compiled into the corresponding shader program, which allows caching non-0 deformVertexes too. This also fixes the incorrect #line numbers emitted by #insert due to macros and such being added after #insert is processed. Also adds more information into how many shader/shader programs were built and how much time it took (there's currently a small discrepancy, because the output is shown at the end of `BuildAll()`, but is gathered every time `BuildPermutation()` or `InitShader()` is called).
91c7bd2
to
228b400
Compare
Fixes #1339
Previously, the GLSL shader building was done in the following manner:
buildAll()
was called, all permutations with empty deform shaders were built. The deform shaders themselves are loaded either at the start (for the default, empty one), or when the map itself is being loaded. These permutations included adding macros to shaders in the form of#ifndef #define #endif
, and attaching the default deform shader. When possible, shader program binaries were saved/loaded to/from disk.There were multiple issues with that system:
This commit changes the underlying system to use an
std::vector
of shader program descriptors, and another one for shader descriptors. The initialisation/building of shaders is now more generic for each type, and as a result each unique shader is only compiled once. The permutations are still linked each time (unless downloaded from cache), but this change lays the foundation to easily add shader pipelines, which would solve this problem.Shader cache binaries are now identified by each shader compiled into the corresponding shader program, which allows caching non-0 deformVertexes too.
This also fixes the incorrect #line numbers emitted by #insert due to macros and such being added after #insert is processed.
Also adds more information into how many shader/shader programs were built and how much time it took (there's currently a small discrepancy, because the output is shown at the end of
BuildAll()
, but is gathered every timeBuildPermutation()
orInitShader()
is called).