Is there an ability to automatically assign vf with GPU affinity to pods? #736

cyclinder · 2024-07-16T07:04:56Z

If the gpu and nic are on the same PCIe bridge or their topology distance is at least PHB, then communication between them can be accelerated by enabling GPU Direct RDMA.

The text was updated successfully, but these errors were encountered:

SchSeba · 2024-07-17T09:33:43Z

that is a kubernetes feature. you can configure device manager and check the topology type

https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#policy-single-numa-node

cyclinder · 2024-07-18T13:14:19Z

Thanks for your reply, I think even if GPU and Nic are in the same NUMA nodes, they may still cross the PCIe bridge, as shown in the figure above, GPU0 and mlx5_3, so in this case, we cannot enable GPU Direct RDMA. The same NUMA nodes may be a large distance, we may need a smaller distance.

adrianchiris · 2024-07-18T15:41:49Z

currently there is no solution that im aware of which takes into account PCIe topology.

DRA (Dynamic Resource Allocation) aims to solve that, but there is still a way to go....

aojea · 2024-11-15T04:14:14Z

This is on DRA roadmap as @adrianchiris mentions, it will be beta in 1.32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an ability to automatically assign vf with GPU affinity to pods? #736

Is there an ability to automatically assign vf with GPU affinity to pods? #736

cyclinder commented Jul 16, 2024

SchSeba commented Jul 17, 2024

cyclinder commented Jul 18, 2024

adrianchiris commented Jul 18, 2024

aojea commented Nov 15, 2024

Is there an ability to automatically assign vf with GPU affinity to pods? #736

Is there an ability to automatically assign vf with GPU affinity to pods? #736

Comments

cyclinder commented Jul 16, 2024

SchSeba commented Jul 17, 2024

cyclinder commented Jul 18, 2024

adrianchiris commented Jul 18, 2024

aojea commented Nov 15, 2024