Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NFD rule for Gaudi resource driver #69

Merged
merged 5 commits into from
Jan 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions charts/intel-gaudi-resource-driver/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,13 @@ name: intel-gaudi-resource-driver
description: A Helm chart for a Dynamic Resource Allocation (DRA) Intel Gaudi Resource Driver

type: application
version: 0.2.0
appVersion: "v0.2.0"
version: 0.3.0
appVersion: "v0.3.0"
home: https://github.com/intel/helm-charts

dependencies:
- name: node-feature-discovery
alias: nfd
version: "0.16.6"
condition: nfd.enabled
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts
6 changes: 4 additions & 2 deletions charts/intel-gaudi-resource-driver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ helm repo update
You can execute `helm search repo intel` command to see pulled charts [optional].

## Install Helm Chart
When installing, update the dependencies:
```
helm dependency update
helm install intel-gaudi-resource-driver intel/intel-gaudi-resource-driver
```
## Upgrade Chart
Expand All @@ -43,7 +45,7 @@ You may also run `helm show values` on this chart's dependencies for additional
| image.repository | string | `intel` |
| image.name | string | `"intel-gaudi-resource-driver"` |
| image.pullPolicy | string | `"IfNotPresent"` |
| image.tag | string | `"v0.2.0"` |
| image.tag | string | `"v0.3.0"` |

> [!Note]
> When upgrading, CRDs from previous version need to be removed manually because Helm supports neither upgrading nor deleting CRDs, see: https://github.com/helm/community/blob/main/hips/hip-0011.md
> If you change the image tag to be used in Helm chart deployment, ensure that the version of the container image is consistent with deployment YAMLs - they might change between releases.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: resource.k8s.io/v1alpha3
apiVersion: resource.k8s.io/v1beta1
oxxenix marked this conversation as resolved.
Show resolved Hide resolved
kind: DeviceClass
metadata:
name: gaudi.intel.com
Expand Down
16 changes: 16 additions & 0 deletions charts/intel-gaudi-resource-driver/templates/nfd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{- if .Values.nfd.enabled }}
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: intel-gaudi-device-rule
spec:
rules:
- name: "intel.gaudi"
labels:
"intel.feature.node.kubernetes.io/gaudi": "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["1da3"]}
device: {op: In, value: ["1020", "1030"]}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,15 @@ spec:
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.nfd.enabled }}
nodeSelector:
intel.feature.node.kubernetes.io/gaudi: "true"
{{- else }}
{{- with .Values.kubeletPlugin.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
{{- with .Values.kubeletPlugin.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ spec:
matchConstraints:
resourceRules:
- apiGroups: ["resource.k8s.io"]
apiVersions: ["v1alpha3"]
apiVersions: ["v1beta1"]
operations: ["CREATE", "UPDATE", "DELETE"]
resources: ["resourceslices"]
matchConditions:
Expand Down
20 changes: 17 additions & 3 deletions charts/intel-gaudi-resource-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ image:
repository: intel
name: intel-gaudi-resource-driver
pullPolicy: IfNotPresent
tag: "v0.2.0"
tag: "v0.3.0"

serviceAccount:
create: true
Expand All @@ -19,13 +19,27 @@ serviceAccount:

kubeletPlugin:
podAnnotations: {}
nodeSelector: {}
# label used when nfd.enabled is true
#intel.feature.node.kubernetes.io/gaudi: "true"
oxxenix marked this conversation as resolved.
Show resolved Hide resolved
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
nodeSelector: {}
#node-role.kubernetes.io/control-plane: ""
# Refer to the official documentation for Node Feature Discovery (NFD)
# regarding node tainting:
# https://nfd.sigs.k8s.io/usage/customization-guide#node-tainting
- key: "intel.feature.node.kubernetes.io/gaudi"
operator: "Exists"
effect: "NoSchedule"
affinity: {}

nfd:
enabled: false # change to true to install NFD to the cluster
nameOverride: intel-gaudi-nfd
# TODO: this deprecated NFD option will be replaced in NFD v0.17 with "featureGates.NodeFeatureAPI" (added in v0.16):
# https://kubernetes-sigs.github.io/node-feature-discovery/v0.16/deployment/helm.html#general-parameters
enableNodeFeatureApi: true