-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyse intel_transport_recv.h at line 1160: cma_read_nbytes == size assert #612
Comments
Blocked on unable to install MPI 2021.11. We don't know where to get if from. |
It's here: http://anpfclxlin02.an.intel.com/rscohn1/ |
Problem with assert in intel_transport_send.h at line 2012 is solved in IMPI 2021.11 (tested on devcloud, with IMPI 2021.11 installed in home dir) |
2021.11 will be published on 11/17 |
I_MPI_OFFLOAD=0 mpirun -n 2 ./build/benchmarks/gbench/mhp/mhp-bench --sycl --benchmark_filter=Sort_DR -> Assertion failed in file ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_recv.h at line 1175: cma_read_nbytes == size However, with I_MPI_OFFLOAD=1 (which should be used with IMPI on GPU) execution of Sort benchmark is successful (devcloud single server, multi-GPU). IMPI 2021.11 private install. |
This is how I set I_MPI_OFFLOAD for the device memory tests: distributed-ranges/CMakeLists.txt Line 216 in 6ad80e7
|
I was told that for the 2021.11 release we can set I_MPI_OFFLOAD=1 all the time and it will not cause an error. I will get rid of this function and set I_MPI_OFFLOAD=1 in the CI script. |
Update: Bug in MPI Jira: https://jira.devtools.intel.com/browse/IMPI-4619
when running on devcloud
ctest -R mhp-sycl-sort-tests-3
on branch https://github.com/lslusarczyk/distributed-ranges/tree/mateusz_sort_expose_mpi_assert
we hit
Some links on useful Intel MPI documentation, tips and hacks:
Intel® MPI for GPU Clusters - article
https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-2/intel-mpi-for-gpu-clusters.html
Environment variables influencing the way GPU support works.
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-10/gpu-support.html
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-10/gpu-buffers-support.html
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-10/gpu-pinning.html
Still, I found the tip for solution of the problem here:
https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/intel-mpi-error-line-1334-cma-read-nbytes-size/m-p/1329220
export I_MPI_SHM_CMA=0 helped in some cases (yet the behaviour seems to be not fully deterministic, maybe depends on which devcloud node is assigned for execution)
People had similar problems in the past:
https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/Intel-oneAPI-2021-4-SHM-Issue/m-p/1324805
When setting the env vars to:
You may also encounter:
Still, simple solution - copy memory from device to host - is countereffective, as IMPI supports GPU-GPU communication
(see https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-10/gpu-buffers-support.html#SECTION_3F5D70BDEFF84E3A84325A319BA53536)
The text was updated successfully, but these errors were encountered: