-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add GPU aware MPI support in cannon algorithm #647
Conversation
Can one of the admins verify this patch? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #647 +/- ##
=========================================
- Coverage 67.0% 66.8% -0.2%
=========================================
Files 105 105
Lines 29121 29188 +67
=========================================
+ Hits 19521 19523 +2
- Misses 9600 9665 +65
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Yes, I will help with this. I think it should not be a blocker for merge. Btw, is there a way to perform atomic FP-add on (certain) AMD GPUs using OpenCL? It seems the CUDA based code path does that but everything I tried in OpenCL did not work for me like C11 atomics, legacy builtins, or even cheating by guessing a prototype function and calling it. I found some doc for inline assembly in OpenCL but I am a bit hesitant to adopt it (like reading all the arch specific doc for AMD GPUs). For NVidia btw I am using PTX inline assembly in OpenCL. |
Regarding the failing check of file headers, I propose to put copyright lines in front of the LICENSE file like this, and to stick with generic
Otherwise we are back to the business of duplicating author notes as part of the source code. I think authorship is recorded as part of repository's metadata and no one is "left behind". As an extension, I also propose to drop AUTHORS file. A general overhaul is to have a file extension ( sorry the proposal is not exactly related and can be perhaps a separate PR ) |
Thank you so much.
I will find out more details and inform you. |
I like this proposal of having a separate LICENSE file. If all DBCSR maintainers agree, I can make the required changes (keep only the DBCSR Developers Group copyright line in each file, and add a new license file with the entire text of the license and copyright lines above that.) |
@hfp, it does not seem like there are atomic add operations in OpenCL for FP type data. You may have to use compare and swap instead to accomplish the same. I found this blog showing an example, but this is not recent. |
* Added c_calculate_norms prototype to ACC/LIBSMM interface/header. * Stub implementation for OpenCL.
* Added c_calculate_norms prototype to ACC/LIBSMM interface/header. * Stub implementation for OpenCL. * Adjusted rules to compile calculate_norms.cpp as CUDA translation unit. * Separated CFLAGS and DFLAGS. Allow unsupported host-compiler (nvcc). * Makefile/OpenCL: improved warning level. * Fixed potential warnings about dereferencing type-punned pointer will break strict-aliasing rules.
* Fixed including header file in calculate_norms.cpp.
Please allow me to share my suggestions for this PR:
In my integration test (#649), I changed including some header files in #if defined(__CUDA)
# include "acc/cuda/acc_cuda.h"
#elif defined(__HIP)
# include "acc/hip/acc_hip.h"
#endif
#include "libsmm_acc_init.h" ... to the following: #if defined(__CUDA)
# include "../cuda/acc_cuda.h"
#elif defined(__HIP)
# include "../hip/acc_hip.h"
#endif
#include "libsmm_acc_init.h" |
@hfp, I merged the master into this branch and changed the path to those header files as you suggested. |
@gsitaram Thanks for this PR and sorry for the late reply (I would like to thank @hfp for his contribution too). I'm putting here some of the topics/comments:
I will add some specific comments in the files. |
@alazzaro , please check if the previous commit reflects our discussion about avoiding the g2g path for all data types other than real_8. |
I think Gina was right with the previous location of |
a04f96b
to
d368160
Compare
63a4d0d
to
8610601
Compare
1ba0f0b
to
6261a60
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
thanks @gsitaram ! |
Add GPU aware MPI support in cannon algorithm with norms calculation in GPU
If CUDA or HIP backend is enabled,
--WITH-DBCSR-G2G
CMake option enables the following:Requirements:
Need help with the following:
c_calculate_norms
" error, @hfp, could you help me fix it for the OpenCL backend?