-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PowerEdge XE9680 H100 Support #611
Comments
Updating status, we've upgraded the NVIDIA GPU Operator to v22.9.2, while upgrading the NVIDIA GPU Driver to v525.85.12.
After the installation, we restarted the nodes and waited for all the Used the following Deployment to execute the benchmark in parallel on all GPUs in the node:
The benchmark resulted in significant performance improvement!
It's safe to say that the Driver upgrade was essential to achieve better and more stable performance. We're looking forward to upgrading the NVIDIA GPU Operator to later versions and progressing towards the R535 Driver family. |
UPDATE: |
Hi, we're maintaining an OpenShift v4.10 cluster, and recently provisioned Dell PowerEdge XE9680 servers as GPU nodes.
We are working with NVIDIA GPU Operator v22.9.1 as for now (aware of the EOL) and the GPUs seem to be exposed and usable, nonetheless, we don't experience the GPU performace we were expecting.
These servers are based on NVIDIA HGX H100 architecture, and according to the NVIDIA GPU Operator v22.9.2 release notes:
Does that mean upgrading the operator and the driver to this version could improve the reduced performance?
Could you please elaborate on the improvements of this driver version?
In addition, which benchmarking tools would you recommend to test these GPUs?
The text was updated successfully, but these errors were encountered: