-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option in toolkit container to enable CDI in runtime #838
base: main
Are you sure you want to change the base?
Conversation
182d161
to
0aaafb4
Compare
tools/container/runtime/runtime.go
Outdated
@@ -89,6 +90,13 @@ func Flags(opts *Options) []cli.Flag { | |||
EnvVars: []string{"NVIDIA_RUNTIME_SET_AS_DEFAULT", "CONTAINERD_SET_AS_DEFAULT", "DOCKER_SET_AS_DEFAULT"}, | |||
Hidden: true, | |||
}, | |||
&cli.BoolFlag{ | |||
Name: "runtime-enable-cdi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let the flag be passed as --enable-cdi
. The envar and the Option variable can have the Runtime/RUNTIME
prefix. This is consistent with the other flags like socket
, restart-mode
etc
Name: "runtime-enable-cdi", | |
Name: "enable-cdi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to be consistent with the option for the nvidia-ctk runtime configure
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we already have a cdi-enabled
/ enable-cdi
flag defined in the toolkit options which triggers the generation of a CDI spec:
nvidia-container-toolkit/tools/container/toolkit/toolkit.go
Lines 173 to 179 in 8d869ac
&cli.BoolFlag{ | |
Name: "cdi-enabled", | |
Aliases: []string{"enable-cdi"}, | |
Usage: "enable the generation of a CDI specification", | |
Destination: &opts.cdiEnabled, | |
EnvVars: []string{"CDI_ENABLED", "ENABLE_CDI"}, | |
}, |
Any ideas on what the name of the new flag should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
One question I would have is whether we need a separate flag? Would it not make sense for the same flag / envvar (e.g. CDI_ENABLED
) to be use to also trigger enabling CDI in the runtime? Do older versions of containerd complain about invalid config options?
If we feel we want finer control, then I'm ok to use RUNTIME_ENABLE_CDI
or ENABLE_CDI_IN_RUNTIME
to be used. (maybe the latter).
tools/container/runtime/runtime.go
Outdated
@@ -89,6 +90,13 @@ func Flags(opts *Options) []cli.Flag { | |||
EnvVars: []string{"NVIDIA_RUNTIME_SET_AS_DEFAULT", "CONTAINERD_SET_AS_DEFAULT", "DOCKER_SET_AS_DEFAULT"}, | |||
Hidden: true, | |||
}, | |||
&cli.BoolFlag{ | |||
Name: "runtime-enable-cdi", | |||
Usage: "Enable CDI in the configured runtime", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: "Enable CDI in the configured runtime", | |
Usage: "Enable Container Device Interface (CDI) in the configured runtime", |
@@ -163,3 +163,7 @@ func (c *ConfigV1) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) { | |||
tree: runtimeData, | |||
}, nil | |||
} | |||
|
|||
func (c *ConfigV1) EnableCDI() { | |||
c.Set("enable_cdi", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to cast c
to a Config
type and call EnableCDI
here instead of reimplementing it.
pkg/config/engine/api.go
Outdated
@@ -24,6 +24,7 @@ type Interface interface { | |||
RemoveRuntime(string) error | |||
Save(string) (int64, error) | |||
GetRuntimeConfig(string) (RuntimeConfig, error) | |||
EnableCDI() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Set
is only used to enable CDI. Let's remove that from the interface if that's the case.
tools/container/container.go
Outdated
SetAsDefault bool | ||
RestartMode string | ||
HostRootMount string | ||
RuntimeEnableCDI bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with the Runtime*
prefix here. Runtime
in this context refers to the NVIDIA runtime. I would just use EnableCDI
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
0aaafb4
to
d4b8bc5
Compare
if o.RuntimeEnableCDI { | ||
cfg.Set("features", map[string]bool{"cdi": true}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we're already enabling CDI in the Configure
call, so there is no need to do it here too.
if o.RuntimeEnableCDI { | |
cfg.Set("features", map[string]bool{"cdi": true}) | |
} |
tools/container/runtime/runtime.go
Outdated
if opts.RuntimeEnableCDI { | ||
opts.RuntimeEnableCDI = false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already handle the case that EnableCDI
is a no-op in the case of CRI-O.
if opts.RuntimeEnableCDI { | |
opts.RuntimeEnableCDI = false | |
} |
pkg/config/engine/docker/docker.go
Outdated
@@ -103,6 +103,11 @@ func (c Config) DefaultRuntime() string { | |||
return r | |||
} | |||
|
|||
// EnableCDI sets features.cdi to true in the docker config. | |||
func (c *Config) EnableCDI() { | |||
c.Set("features", map[string]bool{"cdi": true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope of this PR: I've just thought about the fact that we would overrwrite existing features here.
d4b8bc5
to
edeeaa4
Compare
edeeaa4
to
67d6718
Compare
This change adds an EnableCDI method to the container engine config files and Updates the 'nvidia-ctk runtime configure' command to use this new method. Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
67d6718
to
2b417c1
Compare
@klueska as discussed today, I have rebased this PR. |
These changes add an option to enable CDI in container engines (containerd, crio, docker) from the toolkit contaienr.
This can be triggered through the
--enable-cdi-in-runtime
command line option orRUNTIME_ENABLE_CDI
environment variable.