Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master_june24: CID_ACCESS::kernelAccessConst should be called only once (it is now called once per diagram) #895

Closed
valassi opened this issue Jul 8, 2024 · 2 comments
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jul 8, 2024

Another issue introduced in #830 and being reviewed in #882.

This is a conceptual mistake (in the sense that it differs from the rest of the design of the code), and a (minor) performance penalty. It is also an unncessary increase of bloatware code.

CID_ACCESS::kernelAccessConst is now called once per diagram
https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L339

      // Amplitude(s) for diagram number 1
      FFV1_0<W_ACCESS, A_ACCESS, CD_ACCESS>( w_fp[3], w_fp[2], w_fp[4], COUPs[1], 1.0, &amp_fp[0] );
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      if( channelIds != 0 )
      {
        channelids_sv = CID_ACCESS::kernelAccessConst( channelIds );
...

and https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L365


      // Amplitude(s) for diagram number 2
      FFV1_0<W_ACCESS, A_ACCESS, CD_ACCESS>( w_fp[3], w_fp[4], w_fp[1], COUPs[1], 1.0, &amp_fp[0] );
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      if( channelIds != 0 )
      {
        channelids_sv = CID_ACCESS::kernelAccessConst( channelIds );

and so on

The channelid array is the same for all diagrams (which can be thousands!!!), so it can and should be retrieved only once. This is what is done for numerators and denominators
https://github.com/valassi/madgraph4gpu/blob/8e312bc02d9d072615fcb1052b5db54754498517/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.cc#L320


#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
      // Numerators and denominators for the current event (CUDA) or SIMD event page (C++)
      fptype_sv& numerators_sv = NUM_ACCESS::kernelAccess( numerators );
      fptype_sv& denominators_sv = DEN_ACCESS::kernelAccess( denominators );
#endif

Note in addition that the if checks on channelID should be changed to use nullptr, as described in #892

@valassi valassi self-assigned this Jul 8, 2024
@roiser
Copy link
Member

roiser commented Jul 8, 2024

I agree that a single access would probably slightly enhance the performance. I vaguely remember that I had done this but ran into a problem, but not 100 % sure anymore, pls give it a try as I understand that you want to make the code changes.

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 17, 2024
@valassi
Copy link
Member Author

valassi commented Jul 19, 2024

I fixed this in afea373 and 89c1a81.

(In any case calculate_wavefunction was later changed even more, to go back essentially to what it is in upstream/master, all other changes being unnecessary).

Closing, fixed in #882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants