-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code stalling in HYPRE_ADSSolve
in v2.29.0 with CUDA 11.6.0
#981
Comments
Can you try CUDA 12 and see if this issue persists? We've worked with the CUDA team for this function in various versions and for various issues. Thanks @v-dobrev |
@liruipeng, CUDA 12.0.0 seems to work fine. |
Can we close this issue? Thanks! |
It is up to you guys -- if you want to fix the issue with CUDA 11.6.0, or recommend to users to not use that version. I don't know if other CUDA and hypre versions are affected. |
@liruipeng What do you recommend? From this thread, it seems we should go with the second option, correct? |
There are always bugs/issues in different versions of TPLs. We can't do "fixes" in hypre. The users just need to try a different version that fixes it. |
While testing MFEM 4.6 (build with CUDA) with hypre 2.29.0 (build with CUDA) using CUDA 11.6.0 on Lassen, I noticed that a few of the MFEM examples stall. This seems to happen inside calls to
HYPRE_ADSSolve
. If I use either hypre 2.28.0 or older CUDA (10.1.243) then there are no issues.Digging a little deeper with Totalview, it looks like the issue happens in the function
hypre_CSRMatrixTriLowerUpperSolveCusparse
, specifically in this call:hypre/src/seq_mv/csr_matop_device.c
Lines 2667 to 2671 in 8f6bdc6
Steps to reproduce the issue:
cuda/11.6.0
andgcc/7.3.1
.hypre
directory: see the METIS section here: https://mfem.org/building/#parallel-mpi-version-of-mfem (METIS dowload link: https://github.com/mfem/tpls/raw/gh-pages/metis-4.0.3.tar.gz).hypre
directory from https://github.com/mfem/mfem.git (or [email protected]:mfem/mfem.git)make pcudebug CUDA_ARCH=sm_70 -j 40
examples
directory, buildex4p
:make ex4p
.lrun -n 4 ./ex4p -no-vis -m ../data/fichera.mesh
-- this should stall indefinitely.The text was updated successfully, but these errors were encountered: