-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider separating nvidia-ctk hook
from other functions?
#435
Comments
Any feedback on this? |
This does sound feasible. We would have to provide a migration path for where nvidia-ctk is used directly. @klueska what are your thoughts? |
The binary is not all that big, the functionality that is handled in the hooks part is quite small. I was thinking we could duplicate it - leave it in both nvidia-ctk and the new binary for a whole. |
Hi, coming back to this. I still have interest, and am willing to get it done. Let me know? |
Just to make sure I understand -- the proposal is to:
With the primary reason for this being that (at least today) none of the hooks rely on having an nvidia driver installed in order to run. And the secondary reason being for readability (i.e. point to a specific This seems reasonable to me -- with the second argument actually being the stronger one from my perspective. Mostly because I don't want to forbid us from creating a hook at some point in the future that does rely on calling into e.g. NVML if necessary. |
Yes, although I have no opinion on the name, whatever works is fine with me.
Yes.
Yes. I would extract it all into a go pkg, and then both can include it and avoid much duplication.
That is a good point. I think it might be easier and cleaner, if we ever get there, to have I will see if I can find time in the next week or two to put together a draft PR. |
As I work on that open PR - which is ready in theory, although has issues that are in
Would it make sense for these things to be "basic" CDI services? The CDI spec is here, includes container edit things like |
@deitch extending the CDI specification to handle these operations as first-class concepts may make sense. The reason these are currently run as hooks is that they depend on the container root which is not available at the point of CDI spec generation. Would you create an issue against the CDI repository and I can raise this feature request at a future working group meeting. |
Hi Evan, certainly. Just did it, right after you posted your response. See this CDI issue. That doesn't eliminate the need for the PR, which I would like to get through. Until the spec extension gets approved (if it gets approved), and it gets in, and high-level container runtimes like containerd support it, it can be a long time. Do you mind commenting on that #474 , if you have any understanding why |
NVIDIA Container Toolkit 0.16.0 changed the hook arguments in the Container Device Interface specification generated by it [1]. Having the unknown hook arguments show up in the debug logs makes it easier to understand what happened. [1] NVIDIA Container Toolkit commit 179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit@179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit#435
NVIDIA Container Toolkit 0.16.0 changed the hook arguments in the Container Device Interface specification generated by it [1]. Having the unknown hook arguments show up in the debug logs makes it easier to understand what happened. [1] NVIDIA Container Toolkit commit 179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit@179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit#435 containers#1543
NVIDIA Container Toolkit 0.16.0 changed the hook arguments in the Container Device Interface specification generated by it [1]. Fallout from 649d02f [1] NVIDIA Container Toolkit commit 179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit@179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit#435
NVIDIA Container Toolkit 0.16.0 changed the hook arguments in the Container Device Interface specification generated by it [1]. Fallout from 649d02f [1] NVIDIA Container Toolkit commit 179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit@179d8655f9b5fce6 NVIDIA/nvidia-container-toolkit#435 containers#1544
Would the team consider separating
nvidia-ctk hook
from other functions?The rationale is as follows.
The
hook
behaviours - currentlychmod
,create-symlinks
,update-ldcache
- run standard behaviours. They do not depend on anything GPU-specific orlibnvidia-ml.so
or CUDA in any way. The action of generating a CDI for a particular device (preparation), and the act of performinghook
functions (runtime) are distinct, and can be executed at different times. I might even use a corporate or other non-JP OS which is not Ubuntu-based, generate my CDI inside a container, save the CDI yaml, and then a different process (which does not have access to the correct versions glibc or libnvidia-ml.so or other dependencies) would run the hooks.This would allow me to separate those two functionalities. It also would make the
hook
tool build much simpler.One final advantage is understandability. When I run
nvidia-ctk cdi generate
, I have the option--nvidia-ctkl-path <path>
, which really means "what path tonvidia-ctk
should I place in the CDI yaml?" This can be confusing, since, aren't I executingnvidia-ctk
right now? What does it mean to change my path to... myself?If we separate it, e.g.
nvidia-cdi-hook
, then it becomes easier to understand:nvidia-ctk cdi generate --nvidia-cdi-hook-path /path/to/nvidia-cdi-hook
.I would be fine opening a PR for it, but not going to waste the time if the maintainers don't want it.
The text was updated successfully, but these errors were encountered: