Skip to content

Shader build rework #1586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 12, 2025
Merged

Conversation

VReaperV
Copy link
Contributor

@VReaperV VReaperV commented Mar 10, 2025

Fixes #1339

Previously, the GLSL shader building was done in the following manner:

  1. When the renderer is started, some of the shader source was initialised/processed. This included the main shader text, #insert's, and material system post-processing.
  2. When buildAll() was called, all permutations with empty deform shaders were built. The deform shaders themselves are loaded either at the start (for the default, empty one), or when the map itself is being loaded. These permutations included adding macros to shaders in the form of #ifndef #define #endif, and attaching the default deform shader. When possible, shader program binaries were saved/loaded to/from disk.
  3. UI shader and non-0 deform shaders were built on-demand.

There were multiple issues with that system:

  1. Shader caches were unreliable. This is likely because the checksum was calculated before macros and such were added to the shader text.
  2. Every shader permutation was always compiled and linked every time.
  3. Shader programs with non-0 deformVertexes were never cached.

This commit changes the underlying system to use an std::vector of shader program descriptors, and another one for shader descriptors. The initialisation/building of shaders is now more generic for each type, and as a result each unique shader is only compiled once. The permutations are still linked each time (unless downloaded from cache), but this change lays the foundation to easily add shader pipelines, which would solve this problem.

Shader cache binaries are now identified by each shader compiled into the corresponding shader program, which allows caching non-0 deformVertexes too.

This also fixes the incorrect #line numbers emitted by #insert due to macros and such being added after #insert is processed.

Also adds more information into how many shader/shader programs were built and how much time it took (there's currently a small discrepancy, because the output is shown at the end of BuildAll(), but is gathered every time BuildPermutation() or InitShader() is called).

@VReaperV VReaperV added T-Improvement Improvement for an existing feature A-Renderer T-Cleanup labels Mar 10, 2025
@VReaperV VReaperV force-pushed the shader-build-rework branch 3 times, most recently from 49956ed to a13896f Compare March 11, 2025 05:58
@illwieckz
Copy link
Member

illwieckz commented Mar 11, 2025

I rebased on #1570 on my end to test with various drivers and machines, and I got this with some configuration:

/vid_restart
double free or corruption (out)
Thread 1 "daemon" received signal SIGABRT, Aborted.

Thread 1 (Thread 0x7f5ce83c1b00 (LWP 21747) "daemon"):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007f5cf444527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007f5cf44288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007f5cf44297b6 in __libc_message_impl (fmt=fmt@entry=0x7f5cf45ce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:134
#6  0x00007f5cf44a8ff5 in malloc_printerr (str=str@entry=0x7f5cf45d1ac0 "double free or corruption (out)") at ./malloc/malloc.c:5772
#7  0x00007f5cf44ab120 in _int_free_merge_chunk (av=0x7f5cf4603ac0 <main_arena>, p=0x7f5cbf7fd840, size=140037160078912) at ./malloc/malloc.c:4676
#8  0x00007f5cf44ab43a in _int_free (av=0x7f5cf4603ac0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at ./malloc/malloc.c:4646
#9  0x00007f5cf44addae in __GI___libc_free (mem=0x7f5cbf7fd850) at ./malloc/malloc.c:3398
#10 0x00005dadd58294b9 in Z_Free (ptr=<optimized out>, ptr=<optimized out>) at /src/engine/qcommon/qcommon.h:582
#11 GLShaderManager::FreeAll (this=<optimized out>) at /src/engine/renderer/gl_shader.cpp:250
#12 GLSL_ShutdownGPUShaders () at /src/engine/renderer/tr_shade.cpp:468
#13 0x00005dadd580d8ab in RE_Shutdown (destroyWindow=<optimized out>) at /src/engine/renderer/tr_init.cpp:1599
#14 0x00005dadd5764145 in CL_ShutdownRef () at /src/engine/client/cl_main.cpp:2275
#15 CL_Vid_Restart_f () at /src/engine/client/cl_main.cpp:1448
#16 0x00005dadd5875034 in Cmd::ExecuteCommand (command=..., parseCvars=false, env=0x0) at /src/engine/framework/CommandSystem.cpp:215
#17 0x00005dadd5875693 in Cmd::ExecuteCommandBuffer () at /usr/include/c++/13/bits/basic_string.h:1071
#18 0x00005dadd572964f in Com_Frame () at /src/engine/qcommon/common.cpp:919
#19 0x00005dadd572c4ad in Application::ClientApplication::Frame (this=0x5dadd61703c0 <Application::GetApp()::app>) at /src/engine/client/ClientApplication.cpp:110
#20 0x00005dadd571d4c6 in Application::Frame () at /src/engine/framework/Application.cpp:87
#21 main (argc=<optimized out>, argv=<optimized out>) at /src/engine/framework/System.cpp:1004
/devmap plat23
double free or corruption (out)
Thread 1 "daemon" received signal SIGABRT, Aborted.

Thread 1 (Thread 0x74cd75ad4b00 (LWP 17409) "daemon"):
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x000074cd81a4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x000074cd81a288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x000074cd81a297b6 in __libc_message_impl (fmt=fmt@entry=0x74cd81bce8d7 "%s\n") at ../sysdeps/posix/libc_fatal.c:134
#6  0x000074cd81aa8ff5 in malloc_printerr (str=str@entry=0x74cd81bd1ac0 "double free or corruption (out)") at ./malloc/malloc.c:5772
#7  0x000074cd81aab120 in _int_free_merge_chunk (av=0x74cd81c03ac0 <main_arena>, p=0x74cd512b9840, size=128425940389440) at ./malloc/malloc.c:4676
#8  0x000074cd81aab43a in _int_free (av=0x74cd81c03ac0 <main_arena>, p=<optimized out>, have_lock=<optimized out>) at ./malloc/malloc.c:4646
#9  0x000074cd81aaddae in __GI___libc_free (mem=0x74cd512b9850) at ./malloc/malloc.c:3398
#10 0x0000572aafdbe4b9 in Z_Free (ptr=<optimized out>, ptr=<optimized out>) at /src/engine/qcommon/qcommon.h:582
#11 GLShaderManager::FreeAll (this=<optimized out>) at /src/engine/renderer/gl_shader.cpp:250
#12 GLSL_ShutdownGPUShaders () at /src/engine/renderer/tr_shade.cpp:468
#13 0x0000572aafd9ed05 in InitOpenGL () at /src/engine/renderer/tr_init.cpp:405
#14 0x0000572aafdaa051 in R_Init () at /src/engine/renderer/tr_init.cpp:1460
#15 RE_BeginRegistration (glconfigOut=0x572ab04fdc74 <cls+2802484>, glconfig2Out=0x572ab04fe8a8 <cls+2805608>) at /src/engine/renderer/tr_model.cpp:259
#16 0x0000572aafd053d5 in CL_InitRenderer () at /src/engine/client/cl_main.cpp:2104
#17 CL_StartHunkUsers () at /src/engine/client/cl_main.cpp:2176
#18 0x0000572aafcfdbea in CL_DownloadsComplete () at /src/engine/client/cl_download.cpp:105
#19 0x0000572aafd0b2c7 in CL_InitDownloads () at /src/engine/client/cl_download.cpp:237
#20 CL_ParseGamestate (msg=msg@entry=0x7ffcb27454f0) at /src/engine/client/cl_parse.cpp:460
#21 0x0000572aafd0b8a0 in CL_ParseServerMessage (msg=msg@entry=0x7ffcb27454f0) at /src/engine/client/cl_parse.cpp:570
#22 0x0000572aafd0ce16 in CL_PacketEvent (from=..., msg=0x7ffcb27454f0) at /src/engine/client/cl_main.cpp:1946
#23 0x0000572aafcbc1eb in Com_EventLoop () at /src/engine/qcommon/common.cpp:429
#24 0x0000572aafcbe6e5 in Com_Frame () at /src/engine/qcommon/common.cpp:948
#25 0x0000572aafcc14ad in Application::ClientApplication::Frame (this=0x572ab07053c0 <Application::GetApp()::app>) at /src/engine/client/ClientApplication.cpp:110
#26 0x0000572aafcb24c6 in Application::Frame () at /src/engine/framework/Application.cpp:87
#27 main (argc=<optimized out>, argv=<optimized out>) at /src/engine/framework/System.cpp:1004
  • Starting the game opens the main menu properly.
  • Starting with +devmap plat23 properly loads the map and renders it.
  • But while being on main menu, doing vid_restart or calling devmap triggers some crashes.

@VReaperV
Copy link
Contributor Author

I rebased on #1570 on my end to test with various drivers and machines, and I got this with some configuration:

Maybe just an incorrect rebase or something. Either way, we should deal with that after one of these pr's is merged, as they seem to work fine on their own. Any extra changes added with this pr's commits are out of scope of the pr.

@VReaperV VReaperV force-pushed the shader-build-rework branch from 373a251 to 9775d3b Compare March 11, 2025 17:49
@illwieckz
Copy link
Member

I rebased on #1570 on my end to test with various drivers and machines, and I got this with some configuration:

Maybe just an incorrect rebase or something. Either way, we should deal with that after one of these pr's is merged, as they seem to work fine on their own. Any extra changes added with this pr's commits are out of scope of the pr.

What's funny is that after the rebase it still works on machines where this non-rebased branch already work… I suspect that the bug is not reproducible on all drivers. For example if one driver zeroes allocated memory and another driver doesn't and at some point the allocated-but-unset memory is read, the resulting behavior would differ. Here that crashes when freeing some stuff so that would be something else, but maybe something similar related to freeing is happening.

I doubt this is an incorrect rebase anyway, the code merged without anything conflict to solve, and #1570 doesn't touch that part of the code as far as I know.

@illwieckz
Copy link
Member

illwieckz commented Mar 11, 2025

If I do this it doesn't crash:

--- a/src/engine/qcommon/qcommon.h
+++ b/src/engine/qcommon/qcommon.h
@@ -579,6 +579,7 @@ inline ALLOCATOR char* CopyString(const char* str)
 }
 inline void Z_Free(void* ptr)
 {
+  if (!ptr)
   free(ptr);
 }

@illwieckz
Copy link
Member

This branch can't work if GL_ARB_uniform_buffer_object is missing, as the shaderProgram->uniformBlockIndexes pointer is only set if that extension is there, but always freed.

@illwieckz
Copy link
Member

illwieckz commented Mar 11, 2025

Here is the proper fix:

diff --git a/src/engine/renderer/gl_shader.cpp b/src/engine/renderer/gl_shader.cpp
index 88acd065e..bb808e7dd 100644
--- a/src/engine/renderer/gl_shader.cpp
+++ b/src/engine/renderer/gl_shader.cpp
@@ -288,6 +288,8 @@ void GLShaderManager::UpdateShaderProgramUniformLocations( GLShader* shader, Sha
 		uniform->UpdateShaderProgramUniformLocation( shaderProgram );
 	}
 
+	shaderProgram->uniformBlockIndexes = nullptr;
+
 	if( glConfig2.uniformBufferObjectAvailable ) {
 		// create buffer for storing uniform block indexes
 		shaderProgram->uniformBlockIndexes = ( GLuint* ) Z_Malloc( sizeof( GLuint ) * numUniformBlocks );

Otherwise it will crash when doing:

		if ( program.uniformBlockIndexes ) {
			Z_Free( program.uniformBlockIndexes );
		}

When glConfig2.uniformBufferObjectAvailable is false.

@illwieckz
Copy link
Member

@VReaperV one can reproduce the bug on this branch by running the engine this way:

./daemon -set r_arb_uniform_buffer_object off

@VReaperV
Copy link
Contributor Author

If I do this it doesn't crash:

--- a/src/engine/qcommon/qcommon.h
+++ b/src/engine/qcommon/qcommon.h
@@ -579,6 +579,7 @@ inline ALLOCATOR char* CopyString(const char* str)
 }
 inline void Z_Free(void* ptr)
 {
+  if (!ptr)
   free(ptr);
 }

That doesn't really make sense? You're only doing free() if ptr is nullptr that way...

@VReaperV
Copy link
Contributor Author

VReaperV commented Mar 11, 2025

Here is the proper fix:

diff --git a/src/engine/renderer/gl_shader.cpp b/src/engine/renderer/gl_shader.cpp
index 88acd065e..bb808e7dd 100644
--- a/src/engine/renderer/gl_shader.cpp
+++ b/src/engine/renderer/gl_shader.cpp
@@ -288,6 +288,8 @@ void GLShaderManager::UpdateShaderProgramUniformLocations( GLShader* shader, Sha
 		uniform->UpdateShaderProgramUniformLocation( shaderProgram );
 	}
 
+	shaderProgram->uniformBlockIndexes = nullptr;
+
 	if( glConfig2.uniformBufferObjectAvailable ) {
 		// create buffer for storing uniform block indexes
 		shaderProgram->uniformBlockIndexes = ( GLuint* ) Z_Malloc( sizeof( GLuint ) * numUniformBlocks );

Otherwise it will crash when doing:

		if ( program.uniformBlockIndexes ) {
			Z_Free( program.uniformBlockIndexes );
		}

When glConfig2.uniformBufferObjectAvailable is false.

Oh, it's because it's not set to nullptr by default. I wonder why the previous code didn't crash though, since it also did a conditional Z_Malloc with no default value for the pointer?

@VReaperV
Copy link
Contributor Author

Here is the proper fix:

diff --git a/src/engine/renderer/gl_shader.cpp b/src/engine/renderer/gl_shader.cpp
index 88acd065e..bb808e7dd 100644
--- a/src/engine/renderer/gl_shader.cpp
+++ b/src/engine/renderer/gl_shader.cpp
@@ -288,6 +288,8 @@ void GLShaderManager::UpdateShaderProgramUniformLocations( GLShader* shader, Sha
 		uniform->UpdateShaderProgramUniformLocation( shaderProgram );
 	}
 
+	shaderProgram->uniformBlockIndexes = nullptr;
+
 	if( glConfig2.uniformBufferObjectAvailable ) {
 		// create buffer for storing uniform block indexes
 		shaderProgram->uniformBlockIndexes = ( GLuint* ) Z_Malloc( sizeof( GLuint ) * numUniformBlocks );

Otherwise it will crash when doing:

		if ( program.uniformBlockIndexes ) {
			Z_Free( program.uniformBlockIndexes );
		}

When glConfig2.uniformBufferObjectAvailable is false.

Fixed now.

@illwieckz
Copy link
Member

illwieckz commented Mar 11, 2025

I tested those drivers:

Brand Card Architecture Driver
ATI Radeon X550 HyperMemory RV370 Mesa 24.2.8 r300
AMD Radeon HD 6450 TeraScale 2 Mesa 24.2.8 r600
AMD Radeon RX Vega 8 GCN 5.0 Mesa 24.2.8 radeonsi
AMD Radeon PRO W7600 RDNA 3.0 Mesa 24.2.8 radeonsi
CPU - - Mesa 24.2.8 llvmpipe

I tested switching presets, loading maps, doing vid_restart

Now that the null free is fixed, I don't see any issue anymore.

@VReaperV
Copy link
Contributor Author

I tested those drivers:

Brand Card Architecture Driver
ATI Radeon X550 HyperMemory RV370 Mesa r300
AMD Radeon HD 6450 TeraScale 2 Mesa r600
AMD Radeon RX Vega 8 GCN 5.0 Mesa radeonsi
AMD Radeon Pro W7600 RDNA 3.0 Mesa radeonsi
CPU - - llvmpipe
I tested switching presets, loading maps, doing vid_restart

Now that the null free is fixed, I don't see any issue anymore.

Very good, thanks for testing that!

@illwieckz
Copy link
Member

illwieckz commented Mar 11, 2025

More tests:

Brand Card Architecture Driver
AMD Radeon PRO W7600 RDNA 3.0 ATI oglp 24.10
Nvidia GeForce 6150 LE Curie NV4E Mesa 24.2.8 nv40
Nvidia Quadro K1100M Kepler NVE7 Mesa 24.2.8 nvc0
Nvidia Quadro K1100M Kepler NVE7 Nvidia 470
Intel 965GM GMA Gen4 Mesa 24.2.8 crocus
Intel HD 4600 Gen7 GT2 Mesa 24.2.8 crocus
Intel HD 520 Gen9 GT2 Mesa 24.2.8 iris

They all worked (some also require #1570 but it was included when testing).

@VReaperV
Copy link
Contributor Author

So it's now working well on a wide range hardware, with no issues.

@illwieckz
Copy link
Member

illwieckz commented Mar 12, 2025

Yes, the only generation of PC hardware I haven't tested is the recent GSP-based Nvidia, I don't have one but I believe that's what you use. What I haven't tested is Arm hardware, but if some issue arises we can take care of that later. And give the diversity of tests I run I doubt this will happen anyway, especially since all the Arm hardware I can test uses Mesa, which was already tested in a dozen of different configurations with other hardware.

Copy link
Member

@illwieckz illwieckz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you very much for doing this. That's the kind of work I cannot do myself without doing some months of research, so it would have been very unlikely to see this being done if that was not done by you.

You're saving the world!

Also make sure to display the full shader name rather than just the main shader.
Previously, the GLSL shader building was done in the following manner:
1. When the renderer is started, some of the shader source was initialised/processed. This included the main shader text, #insert's, and material system post-processing.
2. When `buildAll()` was called, all permutations with empty deform shaders were built. The deform shaders themselves are loaded either at the start (for the default, empty one), or when the map itself is being loaded. These permutations included adding macros to shaders in the form of `#ifndef #define #endif`, and attaching the default deform shader. When possible, shader program binaries were saved/loaded to/from disk.
3. UI shader and non-0 deform shaders were built on-demand.

There were multiple issues with that system:
1. Shader caches were unreliable. This is likely because the checksum was calculated *before* macros and such were added to the shader text.
2. Every shader permutation was always compiled and linked every time.
3. Shader programs with non-0 deformVertexes were never cached.

This commit changes the underlying system to use an `std::vector` of shader program descriptors, and another one for shader descriptors. The initialisation/building of shaders is now more generic for each type, and as a result each unique shader is only compiled once. The permutations are still linked each time (unless downloaded from cache), but this change lays the foundation to easily add shader pipelines, which would solve this problem.

Shader cache binaries are now identified by each shader compiled into the corresponding shader program, which allows caching non-0 deformVertexes too.

This also fixes the incorrect #line numbers emitted by #insert due to macros and such being added after #insert is processed.

Also adds more information into how many shader/shader programs were built and how much time it took (there's currently a small discrepancy, because the output is shown at the end of `BuildAll()`, but is gathered every time `BuildPermutation()` or `InitShader()` is called).
@VReaperV VReaperV force-pushed the shader-build-rework branch from 91c7bd2 to 228b400 Compare March 12, 2025 15:47
@VReaperV VReaperV merged commit fd758d7 into DaemonEngine:master Mar 12, 2025
9 checks passed
@VReaperV VReaperV deleted the shader-build-rework branch March 12, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Renderer T-Cleanup T-Improvement Improvement for an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shaders with non-empty macros throw off #insert line numbers
2 participants