arch/espressif: serialize Wi-Fi RX queue access.#18652
arch/espressif: serialize Wi-Fi RX queue access.#18652fdcavalcanti merged 1 commit intoapache:masterfrom
Conversation
Protect the common Espressif Wi-Fi netdev RX queue with a spinlock. The Wi-Fi RX callback enqueues packets while the netdev upper-half RX thread dequeues them, and the IOB queue helpers are not internally serialized. Under sustained receive traffic that race can corrupt the queue state, strand IOBs, and degrade throughput over time as reported in issue apache#16915. Initialize the lock during device setup and use it when clearing the queue on ifup, enqueueing received packets, and dequeuing them for the upper half. Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
|
Hi, Please do not merge it without testing it. For required toolchain please refer to nuttx dockerfile and then check related version of that toolchain support for your target. Seems it does for risc-v also for xtensa |
|
Please provide testing report with sustained IPERF usage. |
|
Why can't this fix be done on network driver side? Also, how are you able to do such fix on a deep part of the source without even knowing how to build said architecture? Sorry but it is quite an odd approach. |
My initial reasoning was that the shared mutable object involved in the suspected race is the driver-owned However, your broader concern is also valid. While I was able to trace the code path and compare it with the older Espressif implementation, that is not a substitute for having the proper toolchain setup and real target testing in place. I should have completed that validation before proposing a fix at this level, and I will correct that before taking this further. |
e176566 to
2359678
Compare
|
Thanks aviralgarg05
|
Thank you, that is a fair suggestion. I agree that I also agree that this pattern may exist in other places. For this PR, I would prefer to keep the scope limited to the Espressif Wi-Fi RX queue race reported here and handle any broader audit separately so this fix remains focused and easier to review. |
|
@aviralgarg05 please update the description with a proper build output. |
There was a problem hiding this comment.
Tested internally and seems to have improved the issue.
Quick output from C3 and C6:
...
0.00- 301.00 sec 227328000 Bytes 6.04 Mbits/sec
iperf exit
nsh> cat /proc/iobinfo
ntotal nfree nwait nthrottle
160 160 0 136
...
0.00- 301.00 sec 434552832 Bytes 11.55 Mbits/sec
iperf exit
nsh>
nsh> cat /proc/iobinfo
ntotal nfree nwait nthrottle
160 160 0 136
No issues on Xtensa as well!
Very nice contribution @aviralgarg05 thank you!
|
@aviralgarg05 please remove those two commits on board since they have a net result of zero changes, then we can merge. This way we don't pollute the git history. |
2359678 to
d275a8e
Compare
Note: Please adhere to Contributing Guidelines.
Summary
This change fixes a race in the common Espressif Wi-Fi lower-half netdev path used by both the Xtensa and RISC-V drivers.
The Wi-Fi RX callback appends packets to
netdev_rx_queue, while the netdev upper-half RX thread removes packets from that same queue. The queue is backed by the IOB queue helpers, and those helpers do not provide their own internal serialization. Under sustained receive traffic, the enqueue and dequeue sides can run concurrently and corrupt the queue state.When that happens, RX buffers can become stranded in the queue, available IOBs drop over time, and throughput degrades until the interface no longer recovers cleanly. This matches the behavior reported in issue #16915.
To fix that, this patch:
ifupThe change is intentionally limited to the shared Espressif Wi-Fi netdev implementation:
arch/risc-v/src/common/espressif/esp_wlan_netdev.carch/xtensa/src/common/espressif/esp_wlan_netdev.cIssue: #16915
Impact
This is a targeted bug fix for Espressif Wi-Fi devices using the common netdev-based lower-half driver.
Expected impact:
iperfCompatibility impact:
Build and maintenance impact:
Testing
Host machine:
CONFIG_HOST_MACOS=y,CONFIG_HOST_ARM64=y)Repository checks completed locally:
Ran style checks directly on the modified files:
Result:
Re-ran patch-level style checks on the final commit:
Result:
Verified there are no whitespace or patch formatting issues:
Result: no output
GitHub Actions build verification for this PR completed successfully.
Relevant Espressif Wi-Fi targets built in CI:
Linux (xtensa-01)builtesp32-devkitc/wifiLinux (risc-v-03)builtesp32c6-devkitc/wifiRepresentative CI build output:
The full PR CI set also passed, including
check,Lint,OOT-Build,Linux (xtensa-01),Linux (xtensa-03), andLinux (risc-v-03).Local target builds were still limited by the available toolchains on this machine, so I could not run on-device Wi-Fi traffic tests here. The code path and fix were instead verified by:
netdev_rx_queueRecommended hardware verification for the PR:
esp32-devkitc:wifiandesp32c6-devkitc:wifipingiperfsessions and compare/proc/iobinfobefore and after