Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an ability to automatically assign vf with GPU affinity to pods? #736

Open
cyclinder opened this issue Jul 16, 2024 · 4 comments
Open

Comments

@cyclinder
Copy link
Contributor

image

If the gpu and nic are on the same PCIe bridge or their topology distance is at least PHB, then communication between them can be accelerated by enabling GPU Direct RDMA.

@SchSeba
Copy link
Collaborator

SchSeba commented Jul 17, 2024

that is a kubernetes feature. you can configure device manager and check the topology type

https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#policy-single-numa-node

@cyclinder
Copy link
Contributor Author

Thanks for your reply, I think even if GPU and Nic are in the same NUMA nodes, they may still cross the PCIe bridge, as shown in the figure above, GPU0 and mlx5_3, so in this case, we cannot enable GPU Direct RDMA. The same NUMA nodes may be a large distance, we may need a smaller distance.

@adrianchiris
Copy link
Collaborator

currently there is no solution that im aware of which takes into account PCIe topology.

DRA (Dynamic Resource Allocation) aims to solve that, but there is still a way to go....

@aojea
Copy link

aojea commented Nov 15, 2024

This is on DRA roadmap as @adrianchiris mentions, it will be beta in 1.32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants