Skip to content

Commit 61a366c

Browse files
authored
allow new node types without changing helm chart (#47)
Signed-off-by: Dmitry Shmulevich <[email protected]>
1 parent 98977eb commit 61a366c

File tree

3 files changed

+48
-26
lines changed

3 files changed

+48
-26
lines changed

charts/virtual-nodes/templates/nodes.yaml

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -58,19 +58,8 @@
5858
{{- $resources = set $resources "ephemeral-storage" "30Ti" }}
5959
{{- $params = set $params "resources" $resources }}
6060

61-
{{/*
62-
# cpu.x86
63-
*/}}
64-
{{- else if eq $node.type "cpu.x86" }}
65-
{{- $resources := deepCopy $defaultResources }}
66-
{{- $resources = set $resources "cpu" 48 }}
67-
{{- $resources = set $resources "memory" "196692052Ki" }}
68-
{{- $resources = set $resources "ephemeral-storage" "2537570228Ki" }}
69-
{{- $params = set $params "resources" $resources }}
70-
7161
{{- else }}
72-
{{- $error := printf "Unsupported node type '%s'" $node.type }}
73-
{{- fail $error }}
62+
{{- $params = set $params "resources" $node.resources }}
7463
{{- end }}
7564

7665
{{- $count := ($node.count | int) }}

charts/virtual-nodes/values-example.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,10 @@ nodes:
4545
type: KernelDeadlock
4646
- type: cpu.x86
4747
count: 2
48+
resources:
49+
hugepages-1Gi: 0
50+
hugepages-2Mi: 0
51+
pods: 110
52+
cpu: 48
53+
memory: 196692052Ki
54+
ephemeral-storage: 2537570228Ki

docs/deployment.md

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -54,31 +54,57 @@ kubectl apply -f charts/overrides/kwok/pod-complete.yml
5454

5555
## Setting up virtual nodes
5656

57-
There are two ways to set up virtual nodes in the cluster, both of which require [Helm v3](https://helm.sh/docs/intro/install/) to be installed on your machine.
58-
59-
### 1. Using the `helm` command
57+
Virtual nodes are configured by setting the following node attributes: `type`, `count`, `annotations`, `labels`, `resources`, and `conditions`. The `type` and `count` attributes are mandatory, while the rest are optional.
6058

61-
Run the `helm install` command and provide the `values.yaml` file that specifies the types and quantities of nodes you wish to create. For example, see the [values-example.yaml](../charts/virtual-nodes/values-example.yaml) file.
62-
Currently, the system includes the following node types:
59+
There are three pre-defined node types:
6360
- [dgxa100.40g](https://docs.nvidia.com/dgx/dgxa100-user-guide/introduction-to-dgxa100.html#hardware-overview)
6461
- [dgxa100.80g](https://docs.nvidia.com/dgx/dgxa100-user-guide/introduction-to-dgxa100.html#hardware-overview)
6562
- [dgxh100.80g](https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-to-dgxh100.html#hardware-overview)
66-
- cpu.x86
6763

68-
To deploy the nodes defined in `values-example.yaml`, use the following command:
69-
```bash
70-
helm upgrade --install virtual-nodes charts/virtual-nodes -f charts/virtual-nodes/values-example.yaml
64+
For these types, the resource attributes are already configured, but you can still modify `count`, `annotations`, `labels`, and `conditions`. For example:
65+
```yaml
66+
- type: dgxa100.80g
67+
count: 2
68+
annotations: {}
69+
labels:
70+
nvidia.com/gpu.count: "8"
71+
nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
72+
conditions:
73+
- message: kernel has no deadlock
74+
reason: KernelHasNoDeadlock
75+
status: "False"
76+
type: KernelDeadlock
7177
```
7278
73-
### 2. Using the Task Specification
79+
For other node types, it is recommended to provide resource capacity. For example:
80+
```yaml
81+
- type: cpu.x86
82+
count: 2
83+
resources:
84+
hugepages-1Gi: 0
85+
hugepages-2Mi: 0
86+
pods: 110
87+
cpu: 48
88+
memory: 196692052Ki
89+
ephemeral-storage: 2537570228Ki
90+
```
91+
92+
There are two ways to set up virtual nodes in the cluster, both of which require [Helm v3](https://helm.sh/docs/intro/install/) to be installed on your machine.
7493
75-
Set up virtual nodes within the `Configure` task in the task specification file. For this example, refer to [test-custom-resource.yml](../resources/tests/test-custom-resource.yml#L11-L19).
94+
- Using the `helm` command:
7695

77-
### Enhancing Node Configurations
96+
Run the `helm install` command and provide the `values.yaml` file that specifies the types and quantities of nodes you wish to create. For example, see the [values-example.yaml](../charts/virtual-nodes/values-example.yaml) file.
97+
98+
To deploy the nodes defined in `values-example.yaml`, use the following command:
99+
```bash
100+
helm upgrade --install virtual-nodes charts/virtual-nodes -f charts/virtual-nodes/values-example.yaml
101+
```
78102

79-
In both methods, you can enhance node configurations by adding annotations, labels, and conditions.
103+
- Using the task specification:
80104

81-
To introduce additional node types, update the `values.yaml` file or the `Configure` task used for node configuration with the node information (such as type, count, etc.), and include a parameters section in the [nodes.yaml](../charts/virtual-nodes/templates/nodes.yaml) file.
105+
Set up virtual nodes within the `Configure` task in the task specification file.
106+
107+
For this example, refer to [test-custom-resource.yml](../resources/tests/test-custom-resource.yml#L11-L19).
82108

83109
> :warning: **Warning:** Ensure you deploy virtual nodes as the final step before launching `knavigator`. If you deploy any components after virtual nodes are created, the pods for these components might be assigned to virtual nodes, which could will their functionality.
84110

0 commit comments

Comments
 (0)