We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
$ module li Currently Loaded Modules: 1) craype-x86-milan 3) craype-network-ofi 5) PrgEnv-gnu/8.5.0 7) cray-libsci/23.12.5 9) craype/2.7.30 11) perftools-base/23.12.0 13) cudatoolkit/12.2 15) gpu/1.0 2) libfabric/1.15.2.0 4) xpmem/2.6.2-2.5_2.38__gd067c3f.shasta 6) cray-dsmml/0.2.2 8) cray-mpich/8.1.28 10) gcc-native/12.3 12) cpe/23.12 14) craype-accel-nvidia80
$ cat doConfigPerlKk.sh bdir=$PWD/build-kokkos cmake -S kokkos -B $bdir \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_SHARED_LIBS=ON \ -DCRAYPE_LINK_TYPE=dynamic \ -DCMAKE_CXX_COMPILER=$PWD/kokkos/bin/nvcc_wrapper \ -DKokkos_ARCH_AMPERE80=ON \ -DKokkos_ENABLE_SERIAL=ON \ -DKokkos_ENABLE_OPENMP=off \ -DKokkos_ENABLE_CUDA=on \ -DKokkos_ENABLE_CUDA_LAMBDA=on \ -DKokkos_ENABLE_DEBUG=off \ -DCMAKE_INSTALL_PREFIX=$bdir/install
$ cat doConfigPerlOmegah.sh #!/bin/bash -ex usage="Usage: $0 <mpi=on|off> <cudaAware=on|off>" [[ $# -ne 2 ]] && echo $usage && exit 1 mpi=$1 [[ $mpi != "on" && $mpi != "off" ]] && echo $usage && exit 1 cudaAware=$2 [[ $cudaAware != "on" && $cudaAware != "off" ]] && echo $usage && exit 1 bdir=$PWD/build-omegah-mpi${mpi}-cudaAware${cudaAware} cmake -S omega_h -B $bdir \ -DCMAKE_INSTALL_PREFIX=$bdir/install \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_SHARED_LIBS=on \ -DOmega_h_USE_Kokkos=on \ -DOmega_h_CUDA_ARCH=80 \ -DOmega_h_USE_MPI=$mpi \ -DOmega_h_USE_CUDA_AWARE_MPI=$cudaAware \ -DBUILD_TESTING=on \ -DCMAKE_CXX_COMPILER=CC
Download the Omega_h delta wing meshes: https://zenodo.org/records/10672130
$ cat submitP2.sh sbatch --nodes 1 --qos regular --time 00:10:00 --constraint gpu --gpus 4 --account=PROJECT_NAME ./runP2.sh
$ cat runP2.sh #!/bin/bash bin_cudaAwareOff=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareoff/src bin_cudaAwareOn=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/build-omegah-mpion-cudaAwareon/src mesh=/pscratch/sd/c/cwsmith/omegahDeltaWingAdapt/twoGpus/deltaWing_500kMetric_p2.osh cmd="$bin_cudaAwareOff/ugawg_hsc_oshmeshload --osh-pool $mesh" export MPICH_GPU_SUPPORT_ENABLED=0 set -x srun -n 2 $cmd &> log2p_cudaAwareOff set +x cmd="$bin_cudaAwareOn/ugawg_hsc_oshmeshload --osh-pool $mesh" export MPICH_GPU_SUPPORT_ENABLED=1 set -x srun -n 2 $cmd &> log2p_cudaAwareOn set +x
$ cat log2p_cudaAwareOn (GTL DEBUG: 0) cuIpcGetMemHandle: invalid argument, CUDA_ERROR_INVALID_VALUE, line no 148 MPICH ERROR [Rank 0] [job id 22622708.1] [Wed Mar 6 07:48:56 2024] [nid002241] - Abort(606713346) (rank 0 in comm 0): Fatal error in PMPI_Isend: Invalid count, error stack: PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed MPID_Isend(584)......................: MPIDI_isend_unsafe(136)..............: MPIDI_SHM_mpi_isend(323).............: MPIDI_CRAY_Common_lmt_isend(84)......: MPIDI_CRAY_Common_lmt_export_mem(103): (unknown)(): Invalid count aborting job: Fatal error in PMPI_Isend: Invalid count, error stack: PMPI_Isend(161)......................: MPI_Isend(buf=0x623196f88, count=2382, MPI_INT, dest=1, tag=42, comm=0xc4000000, request=0x23c3f34) failed MPID_Isend(584)......................: MPIDI_isend_unsafe(136)..............: MPIDI_SHM_mpi_isend(323).............: MPIDI_CRAY_Common_lmt_isend(84)......: MPIDI_CRAY_Common_lmt_export_mem(103): (unknown)(): Invalid count Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize() srun: error: nid002241: task 0: Exited with exit code 255 srun: Terminating StepId=22622708.1 slurmstepd: error: *** STEP 22622708.1 ON nid002241 CANCELLED AT 2024-03-06T15:48:58 *** srun: error: nid002241: task 1: Terminated srun: Force Terminated StepId=22622708.1
The text was updated successfully, but these errors were encountered:
No branches or pull requests
environment
versions
build
run
Download the Omega_h delta wing meshes: https://zenodo.org/records/10672130
error
The text was updated successfully, but these errors were encountered: