Global scan #665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kpentaris wants to merge 62 commits into Devsh-Graphics-Programming:master from kpentaris:global_scan

Contributor

kpentaris commented Mar 11, 2024

Description

Initial implementation of the Global Scan in HLSL

Testing

Not yet finished

TODO list:

Still need to test the HLSL code as well as the porting of the example code to the new API of vulkan_1_3 branch

kpentaris added 8 commits

January 29, 2024 12:51


          Merge branch 'atomicAdd-fix' into global_scan

a34827a


          First commit for global scan implementation. Ported most of the code …

d61a2aa

…but not working yet


          Global scan migration of code to new APIs

0eec0ee


          Merge branch 'Devsh-Graphics-Programming:master' into global_scan

2185b4e


          First commit for global scan implementation. Ported most of the code …

2df4ad7

…but not working yet


          Global scan migration of code to new APIs

20a60f5


          Merge remote

960dbf2


          Merge branch 'Devsh-Graphics-Programming:master' into global_scan

2be8f13

Contributor

devshgraphicsprogrammingjenkins commented Mar 11, 2024

[CI]: Can one of the admins verify this patch?

kpentaris added 4 commits

March 15, 2024 11:21


          Merge branch 'master' into global_scan


          Merge branch 'master' into global_scan

26cb75d


          Update CScanner.h to new NBL API

7117cc3


          Merge branch 'master' of https://github.com/kpentaris/Nabla into glob…

fa979c7

…al_scan

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/declarations.hlsl Outdated

Comment on lines 12 to 14

+              #ifndef NBL_BUILTIN_MAX_SCAN_LEVELS
+              #define NBL_BUILTIN_MAX_SCAN_LEVELS 7
+              #endif

Member

devshgraphicsprogramming Apr 2, 2024

do as NBL_CONSTEXPR instead of define

Member

devshgraphicsprogramming Apr 2, 2024

also this better be formulated slightly differently, in terms of reduction passes, not reduction + downsweep

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/declarations.hlsl Outdated

Comment on lines 25 to 26

		uint32_t topLevel;
		uint32_t temporaryStorageOffset[NBL_BUILTIN_MAX_SCAN_LEVELS/2];

Member

devshgraphicsprogramming Apr 2, 2024

use uint16_t for this

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/declarations.hlsl Outdated

+              		uint32_t topLevel;
+              		uint32_t temporaryStorageOffset[NBL_BUILTIN_MAX_SCAN_LEVELS/2];
+              	};
               	Parameters_t getParameters();

Member

devshgraphicsprogramming Apr 2, 2024

no need for such a forward decl anymore

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/declarations.hlsl Outdated

Comment on lines 31 to 38

+                  struct DefaultSchedulerParameters_t
+                  {
+                      uint32_t finishedFlagOffset[NBL_BUILTIN_MAX_SCAN_LEVELS-1];
+                      uint32_t cumulativeWorkgroupCount[NBL_BUILTIN_MAX_SCAN_LEVELS];
+                  };
+                  DefaultSchedulerParameters_t getSchedulerParameters();

Member

devshgraphicsprogramming Apr 2, 2024

split out schedulers into separate headers

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/declarations.hlsl Outdated

Comment on lines 40 to 54

               	template<typename Storage_t>
               	void getData(
-              		inout Storage_t data,
-              		in uint levelInvocationIndex,
-              		in uint localWorkgroupIndex,
-              		in uint treeLevel,
-              		in uint pseudoLevel
+              		NBL_REF_ARG(Storage_t) data,
+              		NBL_CONST_REF_ARG(uint32_t) levelInvocationIndex,
+              		NBL_CONST_REF_ARG(uint32_t) localWorkgroupIndex,
+              		NBL_CONST_REF_ARG(uint32_t) treeLevel,
+              		NBL_CONST_REF_ARG(uint32_t) pseudoLevel
               	);
-              }
-              }
-              }
-              #define _NBL_HLSL_SCAN_GET_PADDED_DATA_DECLARED_
-              #endif
-              #ifndef _NBL_HLSL_SCAN_SET_DATA_DECLARED_
-              namespace nbl
-              {
-              namespace hlsl
-              {
-              namespace scan
-              {
               	template<typename Storage_t>
               	void setData(
-              		in Storage_t data,
-              		in uint levelInvocationIndex,

Member

devshgraphicsprogramming Apr 2, 2024

no more forward declarations, just let it be an accessor.

Also the level const-ref-args can be rolled up into a single struct SDataIndex

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/descriptors.hlsl Outdated

Comment on lines 0 to 45

+              // Copyright (C) 2023 - DevSH Graphics Programming Sp. z O.O.
+              // This file is part of the "Nabla Engine".
+              // For conditions of distribution and use, see copyright notice in nabla.h
+              #ifndef _NBL_HLSL_SCAN_DESCRIPTORS_INCLUDED_
+              #define _NBL_HLSL_SCAN_DESCRIPTORS_INCLUDED_
-              // choerent -> globallycoherent

                
                    No newline at end of file
+              #include "nbl/builtin/hlsl/scan/declarations.hlsl"
+              #include "nbl/builtin/hlsl/workgroup/basic.hlsl"
+              // coherent -> globallycoherent
+              namespace nbl
+              {
+              namespace hlsl
+              {
+              namespace scan
+              {
+              template<uint32_t scratchElementCount=scratchSz> // (REVIEW): This should be externally defined. Maybe change the scratch buffer to RWByteAddressBuffer? Annoying to manage though...
+              struct Scratch
+              {
+                  uint32_t workgroupsStarted;
+                  uint32_t data[scratchElementCount];
+              };
+              [[vk::binding(0 ,0)]] RWStructuredBuffer<uint32_t /*Storage_t*/> scanBuffer; // (REVIEW): Make the type externalizable. Decide how (#define?)
+              [[vk::binding(1 ,0)]] RWStructuredBuffer<Scratch> globallycoherent scanScratchBuf; // (REVIEW): Check if globallycoherent can be used with Vulkan Mem Model
+              template<typename Storage_t, bool isExclusive=false>
+              void getData(
+                  NBL_REF_ARG(Storage_t) data,
+                  NBL_CONST_REF_ARG(uint32_t) levelInvocationIndex,
+                  NBL_CONST_REF_ARG(uint32_t) localWorkgroupIndex,
+                  NBL_CONST_REF_ARG(uint32_t) treeLevel,
+                  NBL_CONST_REF_ARG(uint32_t) pseudoLevel
+              )
+              {
+                  const Parameters_t params = getParameters(); // defined differently for direct and indirect shaders
+                  uint32_t offset = levelInvocationIndex;
+                  const bool notFirstOrLastLevel = bool(pseudoLevel);
+                  if (notFirstOrLastLevel)
+              		offset += params.temporaryStorageOffset[pseudoLevel-1u];

Member

devshgraphicsprogramming Apr 2, 2024 •

edited

Loading

remove this whole file, it should be userspace

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/direct.hlsl Outdated

Comment on lines 13 to 41

+              // TODO: Can we make it a static variable?
+              groupshared uint32_t wgScratch[SharedScratchSz];
+              #include "nbl/builtin/hlsl/workgroup/arithmetic.hlsl"
+              template<uint16_t offset>
+              struct WGScratchProxy
+              {
+              	uint32_t get(const uint32_t ix)
+              	{
+              		return wgScratch[ix+offset];
+              	}
+              	void set(const uint32_t ix, const uint32_t value)
+              	{
+              		wgScratch[ix+offset] = value;
+              	}
+                  uint32_t atomicAdd(uint32_t ix, uint32_t val)
+                  {
+                      return glsl::atomicAdd(wgScratch[ix + offset], val);
+                  }
+              	void workgroupExecutionAndMemoryBarrier()
+              	{
+              		nbl::hlsl::glsl::barrier();
+              		//nbl::hlsl::glsl::memoryBarrierShared(); implied by the above
+              	}
+              };
+              static WGScratchProxy<0> accessor;

Member

devshgraphicsprogramming Apr 2, 2024

scratches are userspace

Member

devshgraphicsprogramming Jun 14, 2024

use accessors

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/direct.hlsl Outdated

Comment on lines 55 to 64

+              /**
+               * Required since we rely on SubgroupContiguousIndex instead of
+               * gl_LocalInvocationIndex which means to match the global index
+               * we can't use the gl_GlobalInvocationID but an index based on
+               * SubgroupContiguousIndex.
+               */
+              uint32_t globalIndex()
+              {
+              	return nbl::hlsl::glsl::gl_WorkGroupID().x*WORKGROUP_SIZE+nbl::hlsl::workgroup::SubgroupContiguousIndex();
+              }

Member

devshgraphicsprogramming Apr 2, 2024

we have this in a header already

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/direct.hlsl

Comment on lines +46 to +53

+              struct ScanPushConstants
+              {
+                  nbl::hlsl::scan::Parameters_t scanParams;
+                  nbl::hlsl::scan::DefaultSchedulerParameters_t schedulerParams;
+              };
+              [[vk::push_constant]]
+              ScanPushConstants spc;

Member

devshgraphicsprogramming Apr 2, 2024

everything affecting the pipeline layout should be userspace

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/indirect.hlsl

Comment on lines -40 to +38

-              #ifndef _NBL_HLSL_MAIN_DEFINED_
-              [numthreads(_NBL_HLSL_WORKGROUP_SIZE_, 1, 1)]
-              void CSMain()
+              [numthreads(WORKGROUP_SIZE,1,1)]
+              void main()
               {
-              	if (bool(nbl::hlsl::scan::getIndirectElementCount()))
-              		nbl::hlsl::scan::main();
+                  if(bool(nbl::hlsl::scan::getIndirectElementCount())) {
+                      // TODO call main from virtual_workgroup.hlsl
+                  }

Member

devshgraphicsprogramming Apr 2, 2024

all this should be userspace, also there will be very little difference between direct and indirect

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/virtual_workgroup.hlsl Outdated

               namespace nbl
               {
               namespace hlsl
               {
               namespace scan
               {
-              	template<class Binop, class Storage_t>
-              	void virtualWorkgroup(in uint treeLevel, in uint localWorkgroupIndex)
+              	template<class Binop, typename Storage_t, bool isExclusive, uint16_t ItemCount, class Accessor, class device_capabilities=void>

Member

devshgraphicsprogramming Apr 2, 2024

why not template on the workgroup scan instead?

Within you can alias and extract:

binop
storage_t
exclusive or not
item count
smem accessor
device traits necessary

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/virtual_workgroup.hlsl Outdated

Comment on lines 22 to 23

		const uint32_t levelInvocationIndex = localWorkgroupIndex * glsl::gl_WorkGroupSize().x + SubgroupContiguousIndex();
		const bool lastInvocationInGroup = SubgroupContiguousIndex() == (gl_WorkGroupSize().x - 1);

Member

devshgraphicsprogramming Apr 2, 2024

shouldn't ItemCount be used instead of glsl::gl_WorkGroupSize().x ?

Member

devshgraphicsprogramming Jun 14, 2024

most definitely

kpentaris added 5 commits

April 4, 2024 00:40


          Merge branch 'master' into global_scan

53e1655


          Merge branch 'master' into global_scan

0bcc325


          Merge branch 'master' into global_scan

71f4398


          Fix CScanner and related shaders to properly compile

51b4f74


          Merge branch 'master' into global_scan

4b29049

kpentaris added 8 commits

June 19, 2024 11:08


          Merge branch 'Devsh-Graphics-Programming:master' into global_scan

cc43691


          Merge branch 'master' of https://github.com/kpentaris/Nabla

3a5d1ff


          Merge branch 'master' into global_scan

658ac5b


          Merge branch 'global_scan' of https://github.com/kpentaris/Nabla into…

d4a947d

… global_scan


          Merge upstream master

03ae90a


          refactor global reduce to properly work with required global scan cha…

3d252d0

…nges


          Remove unused pseudoLevel parameter

dd2cc09


          Fix workgroupFinishFlagsOffset[0] being twice the needed size

ce78fc0

devshgraphicsprogramming reviewed

View reviewed changes

include/nbl/builtin/hlsl/scan/scheduler.hlsl

Comment on lines +91 to +98

+                      // could do scanScratchBuf[0u].workgroupsStarted[SubgroupContiguousIndex()] = 0u but don't know how many invocations are live during this call
+                      if(workgroup::SubgroupContiguousIndex() == 0u)
+                      {
+                          for(uint32_t i = 0; i < params.topLevel; i++)
+                          {
+                              scanScratchBuf[0u].workgroupsStarted[i] = 0u;
+                          }
+                      }

Member

devshgraphicsprogramming Jul 10, 2024

don't you know the workgroup size?

Contributor Author

kpentaris Jul 10, 2024

My bad, this is actually not used anymore. I think it was being used inside an inRage at some point? Not sure. However we can't reset the workgroupStarted buffer within the shader at the end of the Reduce step (which was the initial goal) because it's possible that there are still workgroups that are "alive" where they won't be doing any work but will still increase the workgroupStarted buffer and some times we end up doing the reset before those WGs exit, ending up with some 1 values instead of all 0s.

kpentaris added 21 commits

July 21, 2024 14:46


          Merge upstream master

f6d7adb


          Revert .gitmodules pointing to examples fork

0d71686


          Merge master

3ce449d


          Merge master

6a0c6c6


          merge master

e467ece


          Merge remote branch

5dae823


          Merge master to global_scan branch

79ec513


          merge upstream Nable to local

7f4fcd5


          Merge upstream to local

462fec5


          merge upstream master to branch

3f9bdd8


          update 3rdparty to point to master submodules

201636d


          update the hlsl/scan module paths to CMakeLists

21bfa0f


          Update carithmeticops with proper enum paths

0a3728c


          merge submodules changes to branch

1c32528


          Merge branch 'master' into global_scan

c9519c3


          Merge upstream master to branch

9f9ae44


          Merge branch 'global_scan' of https://github.com/kpentaris/Nabla into…

d21e216

… global_scan


          merge upstream master

a53fb5c


          Fix issues with ICPUBuffer changes

82bd9a1


          Merge upstream master

87ae80f


          Fix bad shared-mem accessor pattern in direct.hlsl and properly expor…

6e740b3

…t symbols for CReduce and CScanner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet