You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): AKS
GPU Operator Version: any, tried latest as well as 23.6.0
2. Issue or feature description
What the title says. I'm trying to set custom tolerations via daemonsets.tolerations as mentioned in the docs. I've tried all sorts
of syntaxes, but the daemonsets do not get the tolerations applied.
@lmyslinski NFD is deployed as a dependent chart from the operator chart. You need to set these tolerations under node-feature-discovery.master and node-feature-discovery.worker as well. Here are the defaults with the subchart.
Solved similar case.
Like above, it is enough to add daemonsets.tolerations[0].* and node-feature-discovery.worker.tolerations[0].* to helm.
Do not forget add limits-resources-nvidia.com/gpu to your deployment container to wait for nvidia related stuff.
1. Quick Debug Information
23.6.0
2. Issue or feature description
What the title says. I'm trying to set custom tolerations via
daemonsets.tolerations
as mentioned in the docs. I've tried all sortsof syntaxes, but the daemonsets do not get the tolerations applied.
Syntax via yaml file:
helm upgrade -i gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator -f gpu-operator-values.yaml
Yaml file:
Syntax via
--set
:helm upgrade -i gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --set 'daemonsets.tolerations[0].effect=NoSchedule,daemonsets.tolerations[0].key=kubernetes.azure.com/scalesetpriority,daemonsets.tolerations[0].value=spot'
In either case, I cannot see the updated toleration values in the
NFD
daemonset:kubectl get ds gpu-operator-node-feature-discovery-worker -n gpu-operator -o json | jq '.spec.template.spec.tolerations'
:Slightly related to @shivamerla 's answer at #529
Happy to provide more details if needed. Is there anything I'm missing here?
The text was updated successfully, but these errors were encountered: