-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Network mode equivalent to libvirt's hostdev. #25511
Comments
It is not clear to me how these interfaces should be managed by us, the docs you link show how a user who manage them. But how would podman know which interface to use? And how would this look in actual netlink calls? Does the VF create a new interface on the host network namesapce that we then just have to move? Or how is the interface actually being created?
Design wise this will not make sense for us.
And for that you can already write you our own plugin https://github.com/containers/netavark/blob/main/plugin-API.md If this is common use case we consider adding this to the main netavark but right now I don't think this is a common use case so I rather not, especially because this need special hardware to test which makes it likely impossible to test in CI and even hard to get for us maintainers. |
Yes, the virtual function creates a new interface on the host network which can be treated the same as a physical interface. The difference between a virtual function and a physical function is that the virtual functions can be dynamically created and destroyed via the device driver. e.g. (Irrelevant network devices removed for brevity.)
Thank you for linking this, that plugin looks useful for testing. Automatic allocation of virtual functions is a nice to have, but host device networking is the key component. I suppose netavark is being used to provide the DHCP and other services required to configure the network link?
I believe this functionality is used frequently in enterprise VM deployments, although DPUs are replacing some use-cases. It's also useful for certain network types which don't, or historically haven't had, paravirtual drivers. e.g. infiniband. Most enterprise NICs I have seen support this functionality. (e.g. Looking at a random dell server, all the NICs except the Broadcom 5720 support SR-IOV. https://www.dell.com/en-us/shop/servers-storage-and-networking/poweredge-r660xs-rack-server/spd/poweredge-r660xs/pe_r660xs_tm_vi_vp_sb) It is also being used for GPU/TPU virtualization now, which could be relevant to graphical or AI accelerated containers. Unfortunately, getting access to this functionality on a reasonable budget is a lot harder in the current GPU market. (The Intel i350-T2 supports SR-IOV and is currently $70 on Amazon.) I can still see availability for maintainers being an issue, since I believe it would require something like a Threadripper Pro or other workstation CPU/motherboard for SR-IOV networking to be standard. ASRock tends to unofficially enable SR-IOV on their AMD Ryzen boards (it's how I am able to use it at home), but this not a reliable solution. EDIT: I didn't mean to close this, but I do not object if it is closed as out of scope. |
I would appreciate if someone with the suitable permissions would reopen this, I closed it by mistake. I understand if it is not a priority, but I think this feature would be valuable for any containerizing any software which requires high-throughput or low-latency networking, but which shouldn't have full host network access. e.g. It would potentially allow migrating certain firewall or virtual switch VMs to containers. (I have a particular interest in running openvswitch inside a container without sacrificing performance or granting access to the host network.) |
We can keep this open in case other users like voice interest in this. To be clear I am not strictly against having this in the main netavark but I would like to see how such code would look like first before making a final decision (a working plugin could show that). In general if this is just about moving a interface in the namespace as I linked then I think that is something that can be supported easily but I am not sure how we would manage PF/VF functionality. If there is a simple design to do that sure I am happy to consider it. Also note that netavark focus is networking only obviously. For GPU and other hardware modules I would think CDI specs https://github.com/cncf-tags/container-device-interface are used |
I think the best way to handle PF/VF functionality would be to allow assigning multiple functions when creating a network, then treating the resulting pool of network devices similarly to a macvlan network, but without the need for a gateway or the creation of virtual devices (since we're just reusing the virtual devices created by the driver). Containers could be assigned to a specific virtual function using a similar syntax to static IP assignment. e.g. --network net1:ip=10.89.1.5 --network net2:ip=10.89.10.10 becomes --network net1:vf=2 --network net2:vf=3. It probably should not be allowed to change the MAC address, since usually SR-IOV devices are configured to prevent guests from spoofing the MAC. (Spoofing the MAC would allow the container to impersonate the host or other containers because all virtual functions associated with a physical function share a single port on the switch.) |
Feature request description
Hardware features such as SR-IOV allow passing through separate PCIe devices to VMs for hardware accelerated network virtualization. The same could potentially be done inside a privileged container using --network none and --device . It would be more desirable for podman to support this directly in --network to simplify configuration and avoid the need for elevated privileges within the container.
See #8919 and https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/managing-virtual-devices_configuring-and-managing-virtualization#attaching-sr-iov-networking-devices-to-virtual-machines_managing-sr-iov-devices. Macvlan is not a replacement for SR-IOV pass-through because it adds additional overhead and is not equivalent to a simple hostdev passthrough.
Suggest potential solution
Support --network hostdev:name=<adapter name, e.g. enpXXsY>,pf=,vf=. Potentially start by simply allowing adapter pass-through without any "intelligence" regarding physical or virtual functions. e.g. --network hostdev:name=enp12s0f3v4 to manually pass-through a specific virtual function without podman needing to give it special treatment vs. a physical network adapter.
As a "nice to have", allow automatic allocation of physical and virtual functions if the pf and vf flags are unset. This would effectively allow dynamic allocation of virtual functions to pods. Even some decade-old NICs like the Intel X550-T2 support 126 total virtual functions (128 if none are reserved for the host), which is sufficient for many deployments to offload pod networking entirely. This also allows offloading the pod DHCP to the local router, which is desirable in some deployments.
Have you considered any alternatives?
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: