-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new branch with linux header changes #65
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JunAr7112 for the progress on this. I left some comments. I mainly have some questions around how to accurately determine whether we need to install these packages or not. Ideally we would have a robust way to determine this and avoid using / introducing new flags / environment variables for this feature.
if [ ! -d "/usr/src/linux-headers-$(uname -r)/" ]; then | ||
echo "Installing Linux kernel headers..." | ||
apt-get -qq install --no-install-recommends linux-headers-${KERNEL_VERSION} > /dev/null | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An observation -- if we mount /usr/src
from the host AND /usr/src/linux-headers-<kernel-version>
does not exist, meaning the linux headers are not available on the host, then we end up installing the headers both in the container AND on the host at /usr/src/linux-headers-<kernel-version>
. I am not sure if this is desirable.
apt-get -qq download linux-image-${KERNEL_VERSION} && dpkg -x linux-image*.deb . | ||
{ apt-get -qq download linux-modules-${KERNEL_VERSION} && dpkg -x linux-modules*.deb . || true; } 2> /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the linux-headers
package, shouldn't we have a conditional here along the lines of "if linux-image / linux-modules packages are not install, then install them"? I am uncertain on the best way to determine whether we need to install them or not...
local tmp_dir=$(mktemp -d) | ||
|
||
trap "rm -rf ${tmp_dir}" EXIT | ||
cd ${tmp_dir} | ||
|
||
if [ "${PERSIST_DRIVER}" = false ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should not be relying on the PERSIST_DRIVER
environment variable as that feature is not introduced in this PR.
local tmp_dir=$(mktemp -d) | ||
|
||
trap "rm -rf ${tmp_dir}" EXIT | ||
cd ${tmp_dir} | ||
|
||
if [ "${PERSIST_DRIVER}" = false ]; then | ||
rm -rf /lib/modules/${KERNEL_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why cleaning up this directory is required on every driver container instantiation. @shivamerla @tariq1890 would you happen to know why? If we can remove this, then we no longer need this conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On container startup, if we do not mount /lib/modules
or /usr/src
explicitly from the host, both of these directories will be empty inside of the container. As an example:
$ kubectl exec -n gpu-operator -it nvidia-driver-daemonset-7qwn5 -- sh
# ls -ltr /lib/modules/
ls: cannot access '/lib/modules/': No such file or directory
# ls -ltr /usr/src/
total 0
Based on this observation, what are your thoughts on my below proposal?
- Remove the following command from our script as it appears to not be needed:
rm -rf /lib/modules/${KERNEL_VERSION}
- Only install the linux-modules package if
/lib/modules/${KERNEL_VERSION}
directory does not exist.
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION} | ||
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION} | ||
depmod ${KERNEL_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, these commands won't work as they require that the linux-image and linux-modules deb packages were downloaded and extracted locally first.
ls -1 boot/vmlinuz-* | sed 's/\/boot\/vmlinuz-//g' - > version | ||
if [ -z "$(<version)" ]; then | ||
echo "Could not locate Linux kernel version string" >&2 | ||
return 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivamerla after reviewing this again, do we even really require to install the linux-image-$KERNEL_VERSION
package? It seems like we only use it to construct the kernel-version string. However, the kernel version string is already assumed to be set correctly in the KERNEL_VERSION environment variable before we ever reach this point in the script.
Based on my understanding, I think we can remove this code block entirely in all cases and never install the linux-image-$KERNEL_VERSION
package.
if [ "${PERSIST_DRIVER}" = false ]; then | ||
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION} | ||
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION} | ||
depmod ${KERNEL_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my comment https://github.com/NVIDIA/gpu-driver-container/pull/65/files#r1681872260 I would recommend the following change:
if [ "${PERSIST_DRIVER}" = false ]; then | |
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION} | |
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION} | |
depmod ${KERNEL_VERSION} | |
if [ ! -d "/lib/modules/${KERNEL_VERSION}" ]; then | |
{ apt-get -qq download linux-modules-${KERNEL_VERSION} && dpkg -x linux-modules*.deb . || true; } 2> /dev/null | |
mv lib/modules/${KERNEL_VERSION}/modules.* /lib/modules/${KERNEL_VERSION} | |
mv lib/modules/${KERNEL_VERSION}/kernel /lib/modules/${KERNEL_VERSION} | |
depmod ${KERNEL_VERSION} |
new branch with linux header changes