Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No traffic sent on Bluefield-2 #135

Open
1 of 2 tasks
mpodles opened this issue Apr 11, 2024 · 2 comments
Open
1 of 2 tasks

No traffic sent on Bluefield-2 #135

mpodles opened this issue Apr 11, 2024 · 2 comments

Comments

@mpodles
Copy link

mpodles commented Apr 11, 2024

Subject

Dear Team,

I'm trying to benchmark NVMe-oF over TCP/IP with and without XLIO. I'm able to get iperf and spdk_perf working between machines but when XLIO is used, no traffic is coming out of the initiator. It's not visible in either tcpdump, ethtool stats, switch that's in-between the machines or the target machine.

An example packet, that the spdk_perf tries to send is ARP:

#0 qp_mgr_eth_mlx5::fill_wqe (this=0xaaaaaada6010, pswr=) at dev/qp_mgr_eth_mlx5.cpp:485
#1 0x0000fffff7024db0 in qp_mgr_eth_mlx5::send_to_wire (this=0xaaaaaada6010, p_send_wqe=, attr=, request_comp=, tis=, credits=) at dev/qp_mgr_eth_mlx5.cpp:748
#2 0x0000fffff7020ac8 in qp_mgr::send (this=0xaaaaaada6010, p_send_wqe=p_send_wqe@entry=0xaaaaaada4510, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:611
#3 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4510, this=0xaaaaaada4920) at dev/ring_simple.cpp:746
#4 ring_simple::send_ring_buffer (this=0xaaaaaada4920, id=, p_send_wqe=0xaaaaaada4510, attr=) at dev/ring_simple.cpp:776
#5 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4370, is_broadcast=) at proto/neighbour.cpp:1661
#6 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4370) at proto/neighbour.cpp:393

After this, it successfully gets completion in:

#0 cq_mgr_mlx5::poll_and_process_element_tx (this=0xaaaaaada63d0, p_cq_poll_sn=0xffffffffd680) at dev/cq_mgr_mlx5.cpp:542
#1 0x0000fffff7020a68 in qp_mgr::send (this=0xaaaaaada60d0, p_send_wqe=p_send_wqe@entry=0xaaaaaada4700, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:605
#2 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4700, this=0xaaaaaada4b10) at dev/ring_simple.cpp:746
#3 ring_simple::send_ring_buffer (this=0xaaaaaada4b10, id=, p_send_wqe=0xaaaaaada4700, attr=) at dev/ring_simple.cpp:776
#4 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4560, is_broadcast=) at proto/neighbour.cpp:1661
#5 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4560) at proto/neighbour.cpp:393

I've checked the device It's using for the ARP and it looks correct - p1 (it's the name of physical function interface on Bluefield).
Thanks in advance for any help.

Cheers

Issue type

  • Bug report
  • Feature request

Configuration:

  • Product version
    XLIO_VERSION: 3.21.2-0 Development Snapshot built on Mar 22 2024 12:01:27 -- DEBUG --
    Git: d476759

  • OS
    Distributor ID: Ubuntu
    Description: Ubuntu 20.04.6 LTS
    Release: 20.04
    Codename: focal

  • OFED
    MLNX_OFED_LINUX-5.8-3.0.5.0 (OFED-5.8-3.0.5)

  • Hardware
    Bluefield-2 MBF2M516A-CEEO_Ax_Bx (2x100Gbs)

Actual behavior:

No traffic coming out of the network interface even though WQ is posted and CQ is received.

Expected behavior:

SPDK perf or iperf are able to connect and send traffic

Steps to reproduce:

sudo LD_PRELOAD=/opt/mellanox/libxlio/lib/libxlio.so iperf -t 30 -c 20.20.20.4 -m -P 1 -i 1 -M 1500
or
sudo SPDK_XLIO_PATH=/opt/mellanox/libxlio/lib/libxlio.so XLIO_TRACELEVEL=DEBUG ~/spdk-23.01/build/examples/perf -q 64 -o $((2**12)) -w randread -r 'trtype:nvda_tcp adrfam:IPv4 traddr:20.20.20.4 trsvcid:4420' -t 300 -c 0x01 --transport-stats -G --default-sock-impl xlio

@iftahl
Copy link
Collaborator

iftahl commented Jun 4, 2024

@mpodles, I assume you work within ARM OS, and not x86 host, is that correct?

If so, please add the outputs for the following:
cat /etc/mlnx-release
sudo flint -d 03:00.0 q
ip a
sudo ovs-vsctl show
sudo ibdev2netdev -v
sudo mlxconfig -d 03:00.0 -e q

@mpodles
Copy link
Author

mpodles commented Jun 9, 2024

Dear @iftahl

Thanks for support but we've manged to debug the issue, which was using PF instead of SF on the Bluefield ARM SoC. I've assumed that XLIO being userspace stack could leverage the p1 (PF) in the same way that DPDK can, without OvS being in the way but it appears that software netdev representor and OvS is required.

Thanks for getting back to me and I believe this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants