You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training the Qwen 2.5 7B Math model using the example script below, the training process consistently hangs at step 0, with GPU utilization dropping to 0%. This issue occurs when using 8 A100 GPUs on a single node. However, if use_remove_padding is set to False, the training proceeds without any problems. What might be the issue?
When training the Qwen 2.5 7B Math model using the example script below, the training process consistently hangs at step 0, with GPU utilization dropping to 0%. This issue occurs when using 8 A100 GPUs on a single node. However, if
use_remove_padding
is set to False, the training proceeds without any problems. What might be the issue?The text was updated successfully, but these errors were encountered: