-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with nccl_mpi_all_reduce on multinode system #97
Comments
@mpatwary , can you help with this? |
It looks like you are using the right command and I think the problem is unrelated to nccl_mpi_all_reduce. Do the other MPI implementations run well like ring_all_reduce and osu_allreduce? I suspect the problem could be the setup. Does any other code with MPI run well in your system? |
hi @mpatwary, my system has 2 nodes, each with 4 P100 GPUs (total 8 gpus) connected using infiniband, I was wonder how mpirun communicates between the nodes to implement the distributed benchmark? ring_all_reduce and osu_allreduce are throwing errors when I compile the DeepBench benchmarks: Compilation: Normal outputs and errors: I have recompiled, and ran it again with 4 and 8 gpus but now I got another the below error:
bin/nccl_mpi_all_reduce: error while loading shared libraries: libmpi.so.40: cannot open shared object file: No such file or directory mpirun detected that one or more processes exited with non-zero status, thus causing Process name: [[41026,1],0] |
Looks like the code is not getting the path to the mpi lib directory. You can try exporting that. |
@mpatwary, thanks for your prompt reply. I exported that and have gotten other errors. My system has 2 nodes, each with 4 P100 GPUs (total 8 GPUs) connected using InfiniBand, I was wonder how mpirun communicates between the nodes to implement the distributed benchmark?. It looks like the command mpirun --allow-run-as-root -np 8 bin/nccl_mpi_all_reduce is just considering the host node only; my understanding is that mpirun should receive the flag -H with the ib address of both servers (I tried this option but got errors too). Can you share the command line you have used to implement DeepBench nccl_mpi_all_reduce with multinode and multi GPUs systems? Here is the error I am getting considering just the 4 GPUs of the host server: Primary job terminated normally, but 1 process returned mpirun detected that one or more processes exited with non-zero status, thus causing Process name: [[36721,1],0] |
I have a problem here as well. Normal single version works fine. All other MPI applications are working. But i get this one here: NCCL MPI AllReduce
|
Hi all,
What is the command line to run nccl_mpi_all_reduce on a multi-node system (2 nodes with 4 GPUs each one)?, and I am getting the below error when typing this command:
WARNING: There is at least non-excluded one OpenFabrics device found, but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job.
Local host: C4-1
terminate called after throwing an instance of 'std::runtime_error'
what(): Failed to set cuda device
When running only with 4 ranks, I get this output:
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: C4-1
NCCL MPI AllReduce
Num Ranks: 4
[C4130-1:04094] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[C4130-1:04094] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
100000 400000 0.148489 0.148565
3097600 12390400 2.63694 2.63695
4194304 16777216 3.57147 3.57148
6553600 26214400 5.59742 5.59744
16777217 67108868 81.9391 81.9396
38360000 153440000 32.6457 32.6462
Thanks
The text was updated successfully, but these errors were encountered: