Skip to content

Commit 9f1effb

Browse files
authored
KEP-2170: Add validation to Torch numProcPerNode field (kubeflow/trainer#2409)
Signed-off-by: Antonin Stefanutti <[email protected]>
1 parent 6d2629e commit 9f1effb

File tree

4 files changed

+8
-12
lines changed

4 files changed

+8
-12
lines changed

docs/TrainerV1alpha1TorchMLPolicySource.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ TorchMLPolicySource represents a PyTorch runtime configuration.
55
Name | Type | Description | Notes
66
------------ | ------------- | ------------- | -------------
77
**elastic_policy** | [**TrainerV1alpha1TorchElasticPolicy**](TrainerV1alpha1TorchElasticPolicy.md) | | [optional]
8-
**num_proc_per_node** | **str** | Number of processes per node. This value is inserted into the &#x60;--nproc-per-node&#x60; argument of the &#x60;torchrun&#x60; CLI. Supported values: &#x60;auto&#x60;, &#x60;cpu&#x60;, &#x60;gpu&#x60;, or int value. Defaults to &#x60;auto&#x60;. | [optional]
8+
**num_proc_per_node** | [**K8sIoApimachineryPkgUtilIntstrIntOrString**](K8sIoApimachineryPkgUtilIntstrIntOrString.md) | | [optional]
99

1010
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)
1111

docs/TrainerV1alpha1Trainer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Name | Type | Description | Notes
99
**env** | [**list[V1EnvVar]**](V1EnvVar.md) | List of environment variables to set in the training container. These values will be merged with the TrainingRuntime&#39;s trainer environments. | [optional]
1010
**image** | **str** | Docker image for the training container. | [optional]
1111
**num_nodes** | **int** | Number of training nodes. | [optional]
12-
**num_proc_per_node** | **str** | Number of processes/workers/slots on every training node. For the Torch runtime: &#x60;auto&#x60;, &#x60;cpu&#x60;, &#x60;gpu&#x60;, or int value can be set. For the MPI runtime only int value can be set. | [optional]
12+
**num_proc_per_node** | [**K8sIoApimachineryPkgUtilIntstrIntOrString**](K8sIoApimachineryPkgUtilIntstrIntOrString.md) | | [optional]
1313
**resources_per_node** | [**V1ResourceRequirements**](V1ResourceRequirements.md) | | [optional]
1414

1515
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)

kubeflow/trainer/models/trainer_v1alpha1_torch_ml_policy_source.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ class TrainerV1alpha1TorchMLPolicySource(object):
3434
"""
3535
openapi_types = {
3636
'elastic_policy': 'TrainerV1alpha1TorchElasticPolicy',
37-
'num_proc_per_node': 'str'
37+
'num_proc_per_node': 'K8sIoApimachineryPkgUtilIntstrIntOrString'
3838
}
3939

4040
attribute_map = {
@@ -82,21 +82,19 @@ def elastic_policy(self, elastic_policy):
8282
def num_proc_per_node(self):
8383
"""Gets the num_proc_per_node of this TrainerV1alpha1TorchMLPolicySource. # noqa: E501
8484
85-
Number of processes per node. This value is inserted into the `--nproc-per-node` argument of the `torchrun` CLI. Supported values: `auto`, `cpu`, `gpu`, or int value. Defaults to `auto`. # noqa: E501
8685
8786
:return: The num_proc_per_node of this TrainerV1alpha1TorchMLPolicySource. # noqa: E501
88-
:rtype: str
87+
:rtype: K8sIoApimachineryPkgUtilIntstrIntOrString
8988
"""
9089
return self._num_proc_per_node
9190

9291
@num_proc_per_node.setter
9392
def num_proc_per_node(self, num_proc_per_node):
9493
"""Sets the num_proc_per_node of this TrainerV1alpha1TorchMLPolicySource.
9594
96-
Number of processes per node. This value is inserted into the `--nproc-per-node` argument of the `torchrun` CLI. Supported values: `auto`, `cpu`, `gpu`, or int value. Defaults to `auto`. # noqa: E501
9795
9896
:param num_proc_per_node: The num_proc_per_node of this TrainerV1alpha1TorchMLPolicySource. # noqa: E501
99-
:type: str
97+
:type: K8sIoApimachineryPkgUtilIntstrIntOrString
10098
"""
10199

102100
self._num_proc_per_node = num_proc_per_node

kubeflow/trainer/models/trainer_v1alpha1_trainer.py

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ class TrainerV1alpha1Trainer(object):
3838
'env': 'list[V1EnvVar]',
3939
'image': 'str',
4040
'num_nodes': 'int',
41-
'num_proc_per_node': 'str',
41+
'num_proc_per_node': 'K8sIoApimachineryPkgUtilIntstrIntOrString',
4242
'resources_per_node': 'V1ResourceRequirements'
4343
}
4444

@@ -201,21 +201,19 @@ def num_nodes(self, num_nodes):
201201
def num_proc_per_node(self):
202202
"""Gets the num_proc_per_node of this TrainerV1alpha1Trainer. # noqa: E501
203203
204-
Number of processes/workers/slots on every training node. For the Torch runtime: `auto`, `cpu`, `gpu`, or int value can be set. For the MPI runtime only int value can be set. # noqa: E501
205204
206205
:return: The num_proc_per_node of this TrainerV1alpha1Trainer. # noqa: E501
207-
:rtype: str
206+
:rtype: K8sIoApimachineryPkgUtilIntstrIntOrString
208207
"""
209208
return self._num_proc_per_node
210209

211210
@num_proc_per_node.setter
212211
def num_proc_per_node(self, num_proc_per_node):
213212
"""Sets the num_proc_per_node of this TrainerV1alpha1Trainer.
214213
215-
Number of processes/workers/slots on every training node. For the Torch runtime: `auto`, `cpu`, `gpu`, or int value can be set. For the MPI runtime only int value can be set. # noqa: E501
216214
217215
:param num_proc_per_node: The num_proc_per_node of this TrainerV1alpha1Trainer. # noqa: E501
218-
:type: str
216+
:type: K8sIoApimachineryPkgUtilIntstrIntOrString
219217
"""
220218

221219
self._num_proc_per_node = num_proc_per_node

0 commit comments

Comments
 (0)