-
Notifications
You must be signed in to change notification settings - Fork 56
[WIP] Include lspci in vgpu-manager image #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In disconnected environments `dnf install` cannot be used without mirroring RPM repositories. As the vGPU manager now requires lspci command: * Include the RPMs in the container image * If lspci command not found, install it from the RPMs Signed-off-by: Vitaliy Emporopulo <[email protected]>
This update uses a UBI8 image to download the Could this introduce a potential issue when trying to install them on the driver toolkit container? |
RUN mkdir -p /driver/rpms/pciutils | ||
WORKDIR /driver/rpms/pciutils | ||
|
||
RUN dnf download --resolve pciutils && dnf clean all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just install the rpm package instead of downloading it and running it later?
Do we want to use pciutils in the DTK container? If so, we should consider baking pciutils into DTK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently lspci is needed at runtime by the sriov-manage script that we invoke in the DTK container. Vitaly asked the DTK team if they could include the pciutils package in the DTK image, but they said no.
It's also probably a good idea to create a |
This PR is based on work I did for a client who did not have access to an
RPM mirror to install pciutils.
The preferred fix(es) would be for the NVIDIA operator to allow the toolkit
image to be specified via the ClusterPolicy resource. And / or the operator
to provide the toolkit image with pciutils already installed.
…On Tue, 29 Jul 2025 at 4:43 pm, Tariq ***@***.***> wrote:
*tariq1890* left a comment (NVIDIA/gpu-driver-container#386)
<#386 (comment)>
It's also probably a good idea to create a rhel9 version of the
vgpu-manager container which will source packages from ubi9 repositories.
gpu-operator will no longer support RHEL8 based OCP clusters going forward
—
Reply to this email directly, view it on GitHub
<#386 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCQJ6LSXIHEC64VB5RBH7D3K4J2HAVCNFSM6AAAAACCQRBWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMZQHEYDQNRQG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It means the DTK and vgpu-manager images will have to be updated manually during every cluster upgrade. I like @tariq1890 's idea to bump the base image. Other options I can think of:
|
The multistage build is how I got things working for the customer. Below is the diff of those changes.
|
Just to clarify, what I meant was added a new directory for After thinking about this a bit more. Here is the approach I propose: i) I believe the above is cleaner as we don't need to resort to a hack of downloading rpm, placing it in a shared dir and installing it in a container that was never meant to have it or run it. Either we implement the above or we convince the DTK developers to include |
In disconnected environments
dnf install
cannot be used without mirroring RPM repositories. As the vGPU manager now requires lspci command: