-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate CDI specification including additional GIDs #630
base: main
Are you sure you want to change the base?
Conversation
Note: This updates the default spec version to v0.7.0 since the DRM devices generally require additional GIDs. As such, this should probably be an opt-in (or at least an opt-out) feature. |
74441f4
to
285ea44
Compare
@@ -169,6 +174,11 @@ func (m command) build() *cli.Command { | |||
Usage: "Specify a pattern the CSV mount specifications.", | |||
Destination: &opts.csv.ignorePatterns, | |||
}, | |||
&cli.BoolFlag{ | |||
Name: "--allow-additional-gids", | |||
Usage: "Allow the use of the additionalGIDs field for generated CDI specifications. Note this will generate a v0.7.0 CDI specification.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which versions of containerd / cri-o (and other runtimes for that matter) support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to check. Note that this is not enabled by default when generating the CDI spec and only for internal representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- containerd
- (>= 1.7.16) - containerd/containerd@7a2f49f
- (>= 1.8.0, >= 2.0.0) - containerd/containerd@1b62224
- docker (>= 26.1.0) - moby/moby@745e235
- podman (>= 5.1.0) - containers/podman@a40cf31
- crio (>= 1.30.0) - cri-o/cri-o@fd9aa76
internal/config/features.go
Outdated
MOFED *feature `toml:"mofed,omitempty"` | ||
NVSWITCH *feature `toml:"nvswitch,omitempty"` | ||
GDRCopy *feature `toml:"gdrcopy,omitempty"` | ||
NoAdditionalGIDs *feature `toml:"no-additional-gids,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this reversed (i.e. no
rather than allow
) from the flag passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this has different default behaviour. For the NVIDIA Contianer Runtime I want to generated an internal representation including the additionalGIDs
container edits for device nodes. This allows users to opt out if there is an issue.
I suppose we could make the argument that if we backport this (which we may want to due to the Jetpack feature request) we should keep the existing behaviour and not make these modifications either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after having thought this through again, I have disabled this by default and switched the feature flag to --allow-additional-gids
.
internal/edits/device.go
Outdated
deviceNode, err := d.toSpec() | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
var additionalGIDs []uint32 | ||
if allowAdditionalGIDs { | ||
if requiredGID, _ := d.getRequiredGID(); requiredGID != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log the error instead of swallowing it ? Or we could just have the function not return an error struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to not return an error.
This change refactors the creation of contianer edits to make provision for injecting options that affect the generated CDI specifications. Signed-off-by: Evan Lezar <[email protected]>
285ea44
to
141772c
Compare
7ecd467
to
fab0c6d
Compare
This change adds support for injecting additional GIDs using the internal CDI representations. (Applicable for Tegra-based systems and /dev/dri devices) This is disabled by default, but can be opted in to by setting the features.allow-additional-gids feature flag. This can also be done by running sudo nvidia-ctk config --in-place --set features.allow-additional-gids Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
fab0c6d
to
5d55813
Compare
This change adds logic to include the
AdditionalGIDs
required for a device node in a generated CDI specification.When a device node requires specific group membership for access, a non-root user running in a container (e.g. using the
-u
Docker command line argument) may not have access to the device.in the v0.7.0 CDI specification, support was added for specifying additional GIDs in the CDI specification and these changes extract the required information from the device nodes on the host if available.
This is disabled by default.
It can be enabled for internal CDI representations such as those used for Tegra-based systems or
/dev/dri
devices by setting:in the config.toml file.
This can also be opted in to on a per-container basis by setting
NVIDIA_ALLOW_ADDITIONAL_GIDS=enabled
.The
nvidia-ctk
command can be used to toggle this feature by running:It can be enabled when running
nvidia-ctk cdi generate
by specifying the--allow-additional-gids
command line argument: