Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DRA driver for IMEX #1143

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Add DRA driver for IMEX #1143

wants to merge 3 commits into from

Conversation

cdesiniotis
Copy link
Contributor

No description provided.

@cdesiniotis cdesiniotis requested a review from klueska November 28, 2024 00:35
@@ -841,6 +843,60 @@ type SandboxDevicePluginSpec struct {
Env []EnvVar `json:"env,omitempty"`
}

// DRADriverSpec defines the properties for the NVIDIA DRA Driver deployment
// TODO: add 'controller' and 'kubeletPlugin' structs to allow for per-component configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: Should we expose controller and kubeletPlugin as concepts to the user? These seem like internal details and including them here couples the operator and the DRA driver implementation more tightly.

Copy link
Contributor Author

@cdesiniotis cdesiniotis Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we should at this point in time. But I can see where having the ability to control the controller / kubeletPlugin configuration independently could be useful. For example, one may need to bump the cpu / mem resources for the controller (and not the plugin) to account for larger sized clusters. Open to continue discussion on this and enumerate the list of fields we want to expose in Clusterpolicy.

metadata:
name: nvidia-dra-driver
rules:
# TODO: restrict RBAC for DRA driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been done upstream yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @guptaNswati is working on restricting the RBAC upstream: NVIDIA/k8s-dra-driver#219. I will pull in those changes once they are finalized.

- name: controller
image: "FILLED BY THE OPERATOR"
imagePullPolicy: IfNotPresent
command: ["nvidia-dra-controller", "-v", "6"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be setting the verbosity here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set -o allexport
cat /run/nvidia/validations/driver-ready
. /run/nvidia/validations/driver-ready
# TODO: add an alias for DRIVER_ROOT_CTR_PATH in the k8s-dra-driver and remove the below export
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 71 to 72
- name: DEVICE_CLASSES
value: imex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that the DEVICE_CLASSES for the controller and the plugin match?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set from one of these gpu, mig and imex and plugin will automatically pick it up

  • name: DEVICE_CLASSES
    value: {{ .Values.deviceClasses | join "," }}

    here, it seems we are only doing IMEX by default.

Comment on lines 63 to 64
- name: MASK_NVIDIA_DRIVER_PARAMS
value: "false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can a user still override this if required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The user can override this variable in Clusterpolicy with draDriver.env.

- mountPath: /var/lib/kubelet/plugins_registry
name: plugins-registry
- mountPath: /var/lib/kubelet/plugins
mountPropagation: Bidirectional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is bidirectional mount propagation really needed in the plugins folder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -147,6 +148,8 @@ const (
NvidiaCtrRuntimeCDIPrefixesEnvName = "NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES"
// CDIEnabledEnvName is the name of the envvar used to enable CDI in the operands
CDIEnabledEnvName = "CDI_ENABLED"
// NvidiaCTKHookPathEnvName is the name of the envvar specifying the path to the 'nvidia-ctk' binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we shouldn't need the nvidia-ctk path in addition to the nvidia-cdi-hook path. They can be used interchangeably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment. Will remove this once NVIDIA/k8s-dra-driver#210 is complete.

@@ -1539,6 +1543,55 @@ func TransformSandboxDevicePlugin(obj *appsv1.DaemonSet, config *gpuv1.ClusterPo
return nil
}

// TransformDRADriverPlugin transforms nvidia-dra-driver-plugin daemonset with required config as per ClusterPolicy
func TransformDRADriverPlugin(obj *appsv1.DaemonSet, config *gpuv1.ClusterPolicySpec, n ClusterPolicyController) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR: How much merit is there in refactoring these Transform* functions to strip out the common logic that is performed for all containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is definitely some merit. We currently apply some common transforms for all DaemonSets here before calling the individual Transform* functions:

// apply common Daemonset configuration that is applicable to all
err := applyCommonDaemonsetConfig(obj, &n.singleton.Spec)
if err != nil {
logger.Error(err, "Failed to apply common Daemonset transformation", "resource", obj.Name)
return err
}
// transform the host-root and host-dev-char volumes if a custom host root is configured with the operator
transformForHostRoot(obj, n.singleton.Spec.HostPaths.RootFS)
// transform the driver-root volume if a custom driver install dir is configured with the operator
transformForDriverInstallDir(obj, n.singleton.Spec.HostPaths.DriverInstallDir)

We could strip out more, but I believe the main issue is that each Transform* function is reading configuration from a different data type. E.g. TransformDRADriverPlugin reads from a struct of type DRADriverSpec while TransformDriver reads from a struct of type DriverSpec.

}

if config.Toolkit.IsEnabled() {
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-ctk"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rather update the driver to also use the nvidia-cdi-hook path. The following should be sufficient:

Suggested change
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-ctk"))
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-cdi-hook"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created NVIDIA/k8s-dra-driver#210 as a follow-up.

return nil
}

func transformDeployment(obj *appsv1.Deployment, n ClusterPolicyController) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is this not a function defined on a ClusterPolicyController?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason other than following the precedent we have with similar transform functions we have in this file. Is there a good reason for defining this as a method instead?

imagePullPolicy: IfNotPresent
command: ["nvidia-dra-controller", "-v", "6"]
env:
- name: DEVICE_CLASSES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we only doing imex? not mig? we need to do some error handling in case when imex daemon is not running. rn we always expect the imex daemon to be running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope of this PR is to only enable IMEX.

I am not sure what type of error handling you are envisioning, but I believe we need to ensure that the DRA driver daemonset only ever gets scheduled on nodes that are in an IMEX domain. As is, my PR deploys the DRA driver daemonset on all GPU nodes, regardless if they are in an IMEX domain. I can look into updating the nodeAffinity to leverage the IMEX domain label that GFD adds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the nodeAffinity such that the IMEX DRA driver kubelet plugin only gets scheduled on nodes labeled with nvidia.com/gpu.imex-domain

Comment on lines 71 to 72
- name: DEVICE_CLASSES
value: imex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set from one of these gpu, mig and imex and plugin will automatically pick it up

  • name: DEVICE_CLASSES
    value: {{ .Values.deviceClasses | join "," }}

    here, it seems we are only doing IMEX by default.

assets/state-dra-driver/0400_deviceclass-imex.yaml Outdated Show resolved Hide resolved
type: DirectoryOrCreate
- name: driver-install-dir
hostPath:
path: "/run/nvidia/driver"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are still doing host managed drivers for imex, so we need override this path to / with --set driver.enabled=false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mount both /run/nvidia/driver and the host root / into the container, to account for both the driver container and host-managed driver scenarios. In the case of host-managed drivers, we set the DRIVER_CTR_ROOT_PATH envvar to /host (this is the container path) during the container entrypoint script.

@@ -294,6 +294,17 @@ devicePlugin:
# MPS root path on the host
root: "/run/nvidia/mps"

draDriver:
enabled: true
repository: ghcr.io/nvidia
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mirror it to nvcr.io similar to other images and may be retag with semvar versioning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a placeholder for now. We will update this once we publish the DRA driver image to nvcr.io

// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors=true
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.displayName="Resource Requirements"
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.x-descriptors="urn:alm:descriptor:com.tectonic.ui:advanced,urn:alm:descriptor:com.tectonic.ui:resourceRequirements"
Resources *ResourceRequirements `json:"resources,omitempty"`
Copy link
Contributor Author

@cdesiniotis cdesiniotis Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- should we include the resources, args, and env fields at this point in time?

Copy link
Contributor Author

@cdesiniotis cdesiniotis Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the IMEX DRA Driver consists of a controller and kubeletPlugin, it feels as if I should update this so that users can configure each component independently.

controller:
  resources: {}
  env: []
  
kubeletPlugin:
  resources: {}
  env: []

Signed-off-by: Christopher Desiniotis <[email protected]>
@cdesiniotis cdesiniotis force-pushed the dra-driver-imex branch 3 times, most recently from 76894a2 to 6bbbd94 Compare December 16, 2024 22:18
@@ -54,7 +54,7 @@ type ClusterPolicySpec struct {
// DevicePlugin component spec
DevicePlugin DevicePluginSpec `json:"devicePlugin"`
// DRADriver component spec
DRADriver DRADriverSpec `json:"draDriver"`
IMEXDRADriver IMEXDRADriverSpec `json:"imexDRADriver"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants