Skip to content
This repository has been archived by the owner on Feb 8, 2021. It is now read-only.

passthrough support in runv #680

Open
telala opened this issue May 15, 2018 · 7 comments
Open

passthrough support in runv #680

telala opened this issue May 15, 2018 · 7 comments

Comments

@telala
Copy link

telala commented May 15, 2018

I am now working on a program that need to passthrough gpu in runv.
Following is my steps:

  1. add "-device", "vfio-pci,host=0000:08:00.0,id=gpu_0,bus=pci.0,addr=0xf" in amd_64.go
  2. start a runv container
  3. in the container run command: insmod nvidia.ko insmod nvidia-modeset.ko insmod nvidia-uvm.ko insmod nvidia-drm.ko
  4. I can get the following dmesg from the container:
    [ 222.610227] nvidia: loading out-of-tree module taints kernel.
    [ 222.610854] nvidia: module license 'NVIDIA' taints kernel.
    [ 222.611461] Disabling lock debugging due to kernel taint
    [ 222.625106] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
    [ 222.656048] chenxg: load driver:nvidia
    [ 222.656435] chenxg: gpu driver loaded
    [ 222.656839] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.46 Fri Mar 16 22:24:50 PDT 2018 (using threaded interrupts)
    [ 233.260423] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 239
    [ 239.616160] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.46 Fri Mar 16 21:46:30 PDT 2018
    [ 246.169710] [drm] [nvidia-drm] [GPU ID 0x0000000f] Loading driver
    [ 246.170349] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:0f.0 on minor 0

I compared these logs with the host which have nvidia gpu installed they are exactly the same.

One issue I suspected is that I insmod the nvidia.ko in the container.
Maybe I should insmod nvidia.ko in the hyperstart. I tried to insmod nvidia.ko in the main function in hyperstart but there's no insmod command.
Then I copied the insmod command to the hyperstart and got another error:
/insmod: error while loading shared libraries: liblzma.so.5: cannot open shared object file: No such file or directory
Can you give me some advice? Thanks very much:)

@telala
Copy link
Author

telala commented May 15, 2018

@bergwolf From the above message I can see the minor 0 for the gpu device.
Now I can run command insmod nvidia.ko in hyperstart. Still there are no /dev/nvidia0 and /dev/nvidiactl.

@bergwolf
Copy link
Member

@telala There's no difference calling insmod from a container or from hyperstart.

I'm not sure how nvidia creates /dev/nvidia0 and /dev/nvidiactl. Does the nvidia driver package install some udev rules?

Since you can see minor 0, you should be able to call mknod to create the device. But it only represents one device (either nvidia0 or nvidiactl). I'm not sure how to create the other one. Can you run on your host ls -l /dev | grep nvidia and paste the results here?

@telala
Copy link
Author

telala commented May 15, 2018

localhost#ls -l /dev | grep nvidia
crw-rw-rw- 1 root root 195, 0 May 15 20:36 nvidia0
crw-rw-rw- 1 root root 195, 1 May 15 20:36 nvidia1
crw-rw-rw- 1 root root 195, 255 May 15 20:36 nvidiactl

I just tried to install the nvidia driver on my host again and found the /dev/nvidia0 /dev/nvidiactl nodes were not created after nvidia driver was installed.
when I run the nvidia-smi command to test the nvidia driver then the /dev/vidia0 and /dev/nvidiactl nodes were created.
A lot of libraries had been installed when installing the nvidia in host. Maybe I should copy all the nvidia files in host to hyperstart. How do you think? @bergwolf

@bergwolf
Copy link
Member

@telala I think you can first try to copy these files to your container and see if it works from there. Likely they do not need to live inside hyperstart.

@gnawux
Copy link
Member

gnawux commented May 15, 2018

@bergwolf does hyperstart need share some device files under /dev/ to the container?

@bergwolf
Copy link
Member

@gnawux hyperstart shares the same devtmpfs superblock as containers. Any device that hyperstart sees under /dev is present to containers as well.

@telala
Copy link
Author

telala commented May 16, 2018

@bergwolf @gnawux I added all the user level nvidia files to a container image and now I can run nvidia command nvidia-smi in container now and the /dev/nvidia0 and /dev/nvidiactl were created also.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants