Skip to content
This repository has been archived by the owner on Feb 8, 2021. It is now read-only.

can I specify the runtime used for the container runv created behind the vm? #676

Open
telala opened this issue Apr 10, 2018 · 15 comments
Open

Comments

@telala
Copy link

telala commented Apr 10, 2018

runv is an OCI compatible runtime and we use runv as the runtime when starting a runv container:
docker run --rm -it busybox sh

I want to know can we specify the runtime for the container behind the vm? such as nvidia docker runtime?
@gnawux @bergwolf

@bergwolf
Copy link
Member

@telala runv itself is an OCI runtime that is at the same layer as docker runc and the nvidia docker runtime. They all can be specified through the docker run --runtime <oci-runtime> parameter, but cannot be specified at the same time because a container can only be backed by one OCI runtime.

Actually nvidia patched docker runc to include its own prestart-hooks in the nvidia docker runtime. It might be possible to integrate these hooks in runv (plus adding gpu passthrough support), but it cannot be done just through configuration, sorry.

@telala
Copy link
Author

telala commented Apr 10, 2018

@bergwolf runc and nvidia docker runtime are in the same level while using docker.
but runv and nvidia docker runtime are NOT in the same level. runv is used in host while nvidia docker is used in the container in guest.

if we separate the runv into two steps it will be more clear:

  1. create a kvm guest.
  2. create a container in the kvm guest, so the question should be can we specify a runtime for this container?

@bergwolf Is my understanding right?

@bergwolf
Copy link
Member

@telala It is actually the job of hyperstart to create containers in a guest. There is no docker stack in the guest and thus no concept of a runtime there either.

On the host, there can be a docker software stack and runv is in the same position as runc and nvidia docker runtime.

@telala
Copy link
Author

telala commented Apr 13, 2018

@bergwolf I changed the runv code to passthrough a gpu . Now I can see the gpu device using lspci command in the container but there's no gpu node in /dev/* in container.
I think it is because there is no gpu driver in the guest.
But how to install the driver for the gpu?

I think the gpu driver should be installed in the guest.
But I do not know how to install it using hyperstart.
Can you give me some advice?
Thanks very much.

@bergwolf
Copy link
Member

@telala do you mean the gpu kernel driver? You need to include it in the initrd image, by either building it in your guest kernel, or putting it in as a kernel module.

@telala
Copy link
Author

telala commented Apr 13, 2018

yes. the gpu kernel driver. Since the nvidia driver is close source. So I copy the nvidia.ko from a machine that already installed the gpu driver. And put the nvidia.ko into the modules.tar. Then generating the initrd.

Or put the nvidia.ko into the kernel and change the Makefile to include the nvidia then generating the kernel.

I am a little concern about the nvidia driver symbols.

Am I right?
@bergwolf

@bergwolf
Copy link
Member

@telala No, I'm afraid not. You have to use the same kernel (in the guest) that the nvidia.ko was built upon. Mismatching kernel versions can either prevent you from loading the module, or cause unexpected kernel oops.

If you cannot build nvidia.ko on your own (since it's closed source), the only option is for you to get a kernel that works with the nvidia.ko and see if that can boot up the guest with runv.

@telala
Copy link
Author

telala commented Apr 16, 2018

@bergwolf but the guest with runv has to be created using hyperstart?
Can I use my own kernel and initrd with runv?

@bergwolf
Copy link
Member

@telala Yes, you can use your own kernel, and re-create an initrd based on it. If you look at https://github.com/hyperhq/hyperstart/blob/master/build/make-initrd.sh, you can see how initrd is created. Replace with your kernel and modules.tar there, you can re-create the initrd image.

@telala
Copy link
Author

telala commented Apr 16, 2018

@bergwolf I tried to use the the kernel_config in arch/x86_64 to compile the 4.12.4 but the system refuse to run and no error messages output the system seems paused.

I tried to compare the kernel_config for runv and the config in our system. There are too much differences : (
So I want to know are there any kernel configs that runv must set?

@telala
Copy link
Author

telala commented May 14, 2018

@bergwolf following your advise I installed the gpu driver on a kernel and re-created the initrd based on that kernel(kernel and modules).
after the container started I use the command 'insmod nvidia.ko' to insert the nvidia module. From the dmesg I thought the nvidia driver is loaded successfully:
[ 119.230131] nvidia: loading out-of-tree module taints kernel.
[ 119.230756] nvidia: module license 'NVIDIA' taints kernel.
[ 119.231303] Disabling lock debugging due to kernel taint
[ 119.245305] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[ 119.276061] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.46 Fri Mar 16 22:24:50 PDT 2018

But there are no nvidiactl and nvidia0 nodes under /dev(these two directories should be created after nvidia driver loaded. I can see the two directories in my host machine).

Can you give me some advice for this problem? Where do you think I should insmod nvidia.ko?

@bergwolf
Copy link
Member

@telala Is the above log from guest kernel? If so, my guess is that the gpu device is not properly passthrough to the guest. Care to send your gpu passthrough patch here so that we can review and merge it upstream?

@telala
Copy link
Author

telala commented May 14, 2018

@bergwolf yes the above log is from the guest kernel. To support passthrough I just add the
"-device", "vfio-pci,host=0000:08:00.0,id=gpu_0,bus=pci.0,addr=0xf" in amd_64.go.
using lspci in the guest I can see the gpu device with the bdf I specified above.
Is this because the mount namespace?

@bergwolf
Copy link
Member

@telala no, devtmpfs is not mount namespace aware. Containers get the same view of devtmpfs as hyperstart. Can you mknod the device? The log only prints device major, can you find the device minor somewhere? It seems that there is still something wrong with device setup.

@telala
Copy link
Author

telala commented May 15, 2018

@bergwolf I opened a new issue to discuss the passthrough support in runv:
#680

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants