Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) #898

Closed
valassi opened this issue Jul 9, 2024 · 6 comments · Fixed by #882
Assignees
Milestone

Comments

@valassi
Copy link
Member

valassi commented Jul 9, 2024

Another issue introduced in#830 and being reviewed in #882

This is related to (almost a sub-component of) the SIMD issue #894 but I prefer to address it separately.

As clarified during the meeting today and #894 (comment) (thanks @roiser), the cudacpp implementation should assume that all channelids are the same in a warp. To avoid all unexpected behaviour due to hidden assumption, a sanity check (an asser essentially is needed). I think we had discussed this in the past, maybe I will dig that out. Anyway, this is a prerequisite to simplifying the SIMD implementation.

I file this as a separate issue also because it may be appropriate to expose these assumptions in the fortan-cudacpp bridge, so that they have the same assumptions. (Example: warp_size can be modified in the runcard, there must be a sanity check on allowed values).

@valassi valassi self-assigned this Jul 9, 2024
@valassi valassi changed the title master_june24: missing sanity checks that all channelids are the same in a "warp" master_june24: missing sanity checks that all channelids are the same in a "warp" (or at least inside each SIMD vector) Jul 9, 2024
@valassi
Copy link
Member Author

valassi commented Jul 9, 2024

PS On second thought, what I will do is add a sanity check that channelids are all the same inside a SIMD vector. For GPUs, this is not strictly necessary: the implementation can be made to be functional anyway, but it will be slower.

Example. Take hardware SIMD vectors of 4 elements and hardware GPU warps of 32 threads. If every channelid is different for those elements, then:

  • the GPU implementation will give correct results anyway (the channelid of each thread is only accesses in each thread), but there will be a (small) thread divergence
  • the SIMD implementation in master_june24: wrong/bugged and unnecessary SIMD handling of numerators and denominators  #894, conversely, must know this in advance: if it can assume 4 equa channelids, SIMD vectors can be used, otherwise explicit loops must be used... but this must be known at the time of writing the code...

@valassi
Copy link
Member Author

valassi commented Jul 9, 2024

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

@valassi
Copy link
Member Author

valassi commented Jul 9, 2024

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

As discussed in #892 (comment), I am moving to an implementation where calculate_wavefunction and sigmaKin have two differenbt interfaces, in line with what they do

  • calaculate_wavefunctions manages a single SIMD vector or single event, which all have a same calar channelId
  • sigmaKin does the loops over SIMD vectors and decodes channelids, and implements the check described here

With respect to FPTYPE=m, the point to notice is the following, eg take avx2 with 8 floats and 4 doubles

  • sigmakin does a loop over float vectors, so it checks that 8 events have the same channelid
  • calculatewavefunction then loops (iParity loop) over two double vectors (first 4 events, then another 4 events)
  • the single scalar channelid that sigmakin passes to calculate_wavefunctions is valid for 8 events, ie for both double vectors... and a single channelid argument is enough, again

@valassi
Copy link
Member Author

valassi commented Jul 9, 2024

NB: the complication here is that uint_v has 4 ints, while there is the need to check 8 ints, so probably this needs to be done with two separate vectors. I will check...

@oliviermattelaer
Copy link
Member

Concerning:

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

I do not see any problem if the two vector will have different channelID in the "m" type. On the contrary it is better to have different one. The point is that the color computation (only part done in float) does not depend of the channelId.
So they are no penalty to have different ChannelID here, and since the more flipping of value we have in channelID, the better it is to avoid potential bias ...

Obviously in that case, you can not have "nb_warp" set to 1 but this is the only limitation.
So I would say that we do not need dedicated check for the "m" case and on the contrary allow for two different value here.

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…use channels 1,2,3,1,2,3... for different events (previously it was 1 for all events)

NB1: the cuda test now fails, the reference file must be recreated
NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation
NB3: eventually madgraph5#898 the implementation should enforce that all events in a warp use the same channelid
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…channel test madgraph5#896 to use channels 1,2,3,1,2,3... for different events (previously it was 1 for all events)

NB1: the cuda test now fails, the reference file must be recreated
NB2: I expect the SIMD tests to fail using the CUDA reference, due to the different bugs in the current channelId implementation
NB3: eventually madgraph5#898 the implementation should enforce that all events in a warp use the same channelid
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…ph5#894 is SIMD handling of numertaors and denominators

Now the tests madgraph5#898 succeed when using channels123123123...

for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done | egrep '(BACKEND| ME |r.ME>
BACKEND=none
BACKEND=sse4
BACKEND=avx2
BACKEND=512y
BACKEND=512z
BACKEND=cuda

Note: the debug printout gives
DEBUG: MEKB processed 512 events across 3 channels { 1 : 172, 2 : 170, 3 : 170 }
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…cks madgraph5#898 that channelids are the same inside each warp

As expected, the tests madgraph5#898 using 123123123 etc fail the assertions

for bck in none sse4 avx2 512y 512z cuda; do echo BACKEND=$bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done |& egrep '(BACKEND|Assert)'
BACKEND=none
BACKEND=sse4
runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed.
BACKEND=avx2
runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed.
BACKEND=512y
runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed.
BACKEND=512z
runTest_cpp.exe: CPPProcess.cc:335: void mg5amcCpu::calculate_wavefunctions(int, const fptype*, const fptype*, mgOnGpu::fptype*, const unsigned int*, mgOnGpu::fptype*, mgOnGpu::fptype*, mg5amcCpu::fptype_sv*, int): Assertion `channelId == channelIds_sv[i]' failed.
BACKEND=cuda
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…the fact madgraph5#898 that channelids are the same inside each warp

This also fixes all pending CID_ACCESS::ieventAccessRecordConst calls madgraph5#899 and madgraph5#911

Note: CPPProcess.cc is now again very close to upstream/master (most of master_june24 changes from madgraph5#830 have been undone)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
…h5#898 for all processes (using the CODEGEN/recreateRefs.sh script)

Note: some processes fail (some of the fixes in upstream/master may be needed...)

[ RUN      ] SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL.compareMomAndME
DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x1420650 T 256
runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed.

[ RUN      ] SIGMA_SM_GG_TTXGGG_GPU_MULTICHANNEL.compareMomAndME
DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x3a1fe00 T 256
runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed.

[----------] 1 test from SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL
[ RUN      ] SIGMA_SM_GG_TTXGG_GPU_MULTICHANNEL.compareMomAndME
DEBUG: MatrixElementKernelDevice::computeMatrixElements 0x15e77f0 T 256
runTest_cuda.exe: GpuRuntime.h:26: void assertGpu(cudaError_t, const char*, int, bool): Assertion `code == gpuSuccess' failed.
@oliviermattelaer oliviermattelaer added this to the warp milestone Jul 18, 2024
@valassi
Copy link
Member Author

valassi commented Jul 19, 2024

The sanity check on th emain/first channelId SIMD vector is now implemented, essentially here (with many other things) ea2146b

Note: in #924 I added the sanity check that TWO SIMD vectors have the same channelId in mixed mode, essentially here valassi@9e77b2b

Concerning:

Small complication/question: for mixed fptype the check must ensure that TWO double vectors have the same channelId? Or probably not, I must check.

I do not see any problem if the two vector will have different channelID in the "m" type. On the contrary it is better to have different one. The point is that the color computation (only part done in float) does not depend of the channelId. So they are no penalty to have different ChannelID here, and since the more flipping of value we have in channelID, the better it is to avoid potential bias ...

Obviously in that case, you can not have "nb_warp" set to 1 but this is the only limitation. So I would say that we do not need dedicated check for the "m" case and on the contrary allow for two different value here.

Thanks Olivier.

See the extensive discussion I added in #911 (comment)

I was going to make many comments here about the mixed fptype, but in the end I made them here #926 (comment)

The sanity checks are done (in PR #882), both for one vector and (in mixed mode) for two vectors. This can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants