You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It take's argument as group but doesn't consider it while retrieving rank w.r.t. given group, instead returns world rank.Implementation support is required to return rank for given group.
Problem :
mcr-dl's support for get_rank(group) doesn't support returning rank with given group, instead it will return world rank.
Description :
MCR-DL/mcr_dl/mpi.py
Line 82 in 870751f
MCR-DL/mcr_dl/ops/csrc/comm/mpi.cpp
Line 76 in 870751f
MCR-DL/mcr_dl/nccl.py
Line 85 in 870751f
MCR-DL/mcr_dl/ops/csrc/comm/nccl.cpp
Line 79 in 870751f
It take's argument as group but doesn't consider it while retrieving rank w.r.t. given group, instead returns world rank.Implementation support is required to return rank for given group.
This requirement is necessary to support MCR-DL with Megatron-LM as Megatron-LM also relies on such calls. (refer : https://github.com/OSU-Nowlab/Megatron-LM/blob/5f9c870f9f24b482509699d206a9dbb00958f6fc/megatron/model/transformer.py#L1563)
Testing :
The text was updated successfully, but these errors were encountered: