You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the cuda backend for distributed training, it always errors out when you set the maxSimultaneousGames to higher than 4000. Nvidia H100s and H200s can easily handle more than 4000 games simultaneously. Can this be increased in the next version?
EDIT: This is using v1.15.3
The text was updated successfully, but these errors were encountered:
Thanks for the feedback. What is the optimal number of threads that the benchmark command concludes that you should use? And is this a setup with multiple GPUs, or only one?
The benchmark tool says 320 is the optimal number of threads, but GPU utilization never grows beyond 10% and gpu memory usage barely hits 8GB. That leaves a lot of performance on the table that just flat our isn't being used. This is an 8 GPU setup, but on a single GPU, the benchmark tool also says 320 is the optimal number.
Using the cuda backend for distributed training, it always errors out when you set the maxSimultaneousGames to higher than 4000. Nvidia H100s and H200s can easily handle more than 4000 games simultaneously. Can this be increased in the next version?
EDIT: This is using v1.15.3
The text was updated successfully, but these errors were encountered: