You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I was attempting to use the GDRDMA feature, I followed the deployment instructions described in GPU-operator. I have already installed the OFED driver on my physical machine (non-containerized form), so I set the parameters "--set driver.rdma.enabled=true --set driver.rdma.useHostMofed=true." But the Driver-daemon pod get error:
Here are the pod status:
4. Information to attach (optional if deemed irrelevant)
kubernetes driver pods logs:
kubernetes daemonset status: kubectl get ds -n OPERATOR_NAMESPACE
@ReyRen from the debug bundle provided looks like driver pod logs are truncated. Can you get logs from "nvidia-driver-ctr" container within the driver pod. Looks like NVIDIA driver install is not going through. Attaching logs from dmesg also will help.
1. Quick Debug Information
Centos7.9
Linux a800-master 3.10.0-1160.95.1.el7.x86_64 SMP Mon Jul 24 13:59:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Docker
K8s 1.20.2
22.9.1
2. Issue or feature description
When I was attempting to use the GDRDMA feature, I followed the deployment instructions described in GPU-operator. I have already installed the OFED driver on my physical machine (non-containerized form), so I set the parameters "--set driver.rdma.enabled=true --set driver.rdma.useHostMofed=true." But the Driver-daemon pod get error:
Here are the pod status:
4. Information to attach (optional if deemed irrelevant)
kubectl get ds -n OPERATOR_NAMESPACE
Full debug bundle already send to [email protected]*
The text was updated successfully, but these errors were encountered: