Skip to content

arch/espressif: serialize Wi-Fi RX queue access.#18652

Merged
fdcavalcanti merged 1 commit intoapache:masterfrom
aviralgarg05:fix/issue-16915-esp-wlan-rx-queue-lock
Apr 8, 2026
Merged

arch/espressif: serialize Wi-Fi RX queue access.#18652
fdcavalcanti merged 1 commit intoapache:masterfrom
aviralgarg05:fix/issue-16915-esp-wlan-rx-queue-lock

Conversation

@aviralgarg05
Copy link
Copy Markdown
Contributor

@aviralgarg05 aviralgarg05 commented Mar 30, 2026

Note: Please adhere to Contributing Guidelines.

Summary

This change fixes a race in the common Espressif Wi-Fi lower-half netdev path used by both the Xtensa and RISC-V drivers.

The Wi-Fi RX callback appends packets to netdev_rx_queue, while the netdev upper-half RX thread removes packets from that same queue. The queue is backed by the IOB queue helpers, and those helpers do not provide their own internal serialization. Under sustained receive traffic, the enqueue and dequeue sides can run concurrently and corrupt the queue state.

When that happens, RX buffers can become stranded in the queue, available IOBs drop over time, and throughput degrades until the interface no longer recovers cleanly. This matches the behavior reported in issue #16915.

To fix that, this patch:

  • adds a dedicated RX spinlock to the shared Espressif Wi-Fi netdev private state
  • initializes the lock during device setup
  • takes the lock before clearing the RX queue during ifup
  • takes the lock before enqueueing packets in the Wi-Fi RX callback
  • takes the lock before dequeueing packets for the upper half

The change is intentionally limited to the shared Espressif Wi-Fi netdev implementation:

  • arch/risc-v/src/common/espressif/esp_wlan_netdev.c
  • arch/xtensa/src/common/espressif/esp_wlan_netdev.c

Issue: #16915

Impact

This is a targeted bug fix for Espressif Wi-Fi devices using the common netdev-based lower-half driver.

Expected impact:

  • prevents RX queue corruption caused by concurrent enqueue/dequeue access
  • avoids leaking or stranding IOB-backed RX packets under heavy traffic
  • improves long-run Wi-Fi stability during sustained throughput tests such as iperf
  • applies consistently to both Xtensa and RISC-V Espressif Wi-Fi targets that use the shared implementation

Compatibility impact:

  • no API changes
  • no Kconfig changes
  • no board configuration changes
  • no documentation changes required

Build and maintenance impact:

  • only two driver source files are touched
  • the fix follows the same locking model already used by older Espressif Wi-Fi code paths when protecting RX/TX queues

Testing

Host machine:

  • macOS on Apple Silicon (CONFIG_HOST_MACOS=y, CONFIG_HOST_ARM64=y)

Repository checks completed locally:

  1. Ran style checks directly on the modified files:

    ./tools/checkpatch.sh -f arch/risc-v/src/common/espressif/esp_wlan_netdev.c \
      arch/xtensa/src/common/espressif/esp_wlan_netdev.c

    Result:

    All checks pass.
    
  2. Re-ran patch-level style checks on the final commit:

    ./tools/checkpatch.sh -g HEAD~1..HEAD

    Result:

    All checks pass.
    
  3. Verified there are no whitespace or patch formatting issues:

    git diff --check

    Result: no output

GitHub Actions build verification for this PR completed successfully.

Relevant Espressif Wi-Fi targets built in CI:

  • Linux (xtensa-01) built esp32-devkitc/wifi
  • Linux (risc-v-03) built esp32c6-devkitc/wifi

Representative CI build output:

Linux (xtensa-01)  Configuration/Tool: esp32-devkitc/wifi
Linux (xtensa-01)    Cleaning...
Linux (xtensa-01)    Configuring...
Linux (xtensa-01)    Building NuttX...
Linux (xtensa-01)    [1/1] Normalize esp32-devkitc/wifi
Linux (risc-v-03)  Configuration/Tool: esp32c6-devkitc/wifi
Linux (risc-v-03)    Cleaning...
Linux (risc-v-03)    Configuring...
Linux (risc-v-03)    Building NuttX...
Linux (risc-v-03)    [1/1] Normalize esp32c6-devkitc/wifi

The full PR CI set also passed, including check, Lint, OOT-Build, Linux (xtensa-01), Linux (xtensa-03), and Linux (risc-v-03).

Local target builds were still limited by the available toolchains on this machine, so I could not run on-device Wi-Fi traffic tests here. The code path and fix were instead verified by:

  • tracing the RX ownership path in the common Espressif netdev lower half
  • comparing it against the older Espressif Wi-Fi implementation, which already serialized queue access
  • confirming that the only shared mutable structure between the RX callback and upper-half RX thread was the unprotected netdev_rx_queue
  • validating the final patch with repeated style and patch checks

Recommended hardware verification for the PR:

  1. Build and flash esp32-devkitc:wifi and esp32c6-devkitc:wifi
  2. Connect to Wi-Fi and confirm normal connectivity with ping
  3. Run repeated iperf sessions and compare /proc/iobinfo before and after
  4. Confirm IOB counts recover normally and throughput does not collapse after sustained traffic

Protect the common Espressif Wi-Fi netdev RX queue with a spinlock. The Wi-Fi RX callback enqueues packets while the netdev upper-half RX thread dequeues them, and the IOB queue helpers are not internally serialized.

Under sustained receive traffic that race can corrupt the queue state, strand IOBs, and degrade throughput over time as reported in issue apache#16915.

Initialize the lock during device setup and use it when clearing the queue on ifup, enqueueing received packets, and dequeuing them for the upper half.

Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
@github-actions github-actions bot added Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: xtensa Issues related to the Xtensa architecture Size: S The size of the change in this PR is small labels Mar 30, 2026
@eren-terzioglu
Copy link
Copy Markdown
Contributor

Hi,

Please do not merge it without testing it.

For required toolchain please refer to nuttx dockerfile and then check related version of that toolchain support for your target. Seems it does for risc-v also for xtensa

@fdcavalcanti
Copy link
Copy Markdown
Contributor

Please provide testing report with sustained IPERF usage.

@fdcavalcanti
Copy link
Copy Markdown
Contributor

Why can't this fix be done on network driver side?

Also, how are you able to do such fix on a deep part of the source without even knowing how to build said architecture? Sorry but it is quite an odd approach.

@aviralgarg05
Copy link
Copy Markdown
Contributor Author

Why can't this fix be done on network driver side?

Also, how are you able to do such fix on a deep part of the source without even knowing how to build said architecture? Sorry but it is quite an odd approach.

My initial reasoning was that the shared mutable object involved in the suspected race is the driver-owned netdev_rx_queue. The Wi-Fi RX callback enqueues packets into that queue, while the netdev upper-half RX thread dequeues them, and the IOB queue helpers themselves do not provide internal serialization. Based on that, I considered the lower-half driver to be the narrowest place to address the problem.

However, your broader concern is also valid. While I was able to trace the code path and compare it with the older Espressif implementation, that is not a substitute for having the proper toolchain setup and real target testing in place. I should have completed that validation before proposing a fix at this level, and I will correct that before taking this further.

acassis
acassis previously approved these changes Apr 1, 2026
@aviralgarg05 aviralgarg05 force-pushed the fix/issue-16915-esp-wlan-rx-queue-lock branch from e176566 to 2359678 Compare April 1, 2026 15:02
@ankohuu
Copy link
Copy Markdown
Contributor

ankohuu commented Apr 5, 2026

Thanks aviralgarg05

  • Maybe naming it rx_queue_lock instead of rx_lock? rx_lock sounds like protecting rx path rather than just the queue?
  • Not for this patch, but seems there are several similar issues? I just did grep
    pkt = netpkt_remove_queue(&self->rx_queue);

@aviralgarg05
Copy link
Copy Markdown
Contributor Author

Thanks aviralgarg05

  • Maybe naming it rx_queue_lock instead of rx_lock? rx_lock sounds like protecting rx path rather than just the queue?
  • Not for this patch, but seems there are several similar issues? I just did grep
    pkt = netpkt_remove_queue(&self->rx_queue);

Thank you, that is a fair suggestion.

I agree that rx_queue_lock is a better name here, since the lock is only intended to protect the RX queue itself. I will rename it accordingly.

I also agree that this pattern may exist in other places. For this PR, I would prefer to keep the scope limited to the Espressif Wi-Fi RX queue race reported here and handle any broader audit separately so this fix remains focused and easier to review.

@fdcavalcanti
Copy link
Copy Markdown
Contributor

@aviralgarg05 please update the description with a proper build output.
I can test this on internal CI soon.

Copy link
Copy Markdown
Contributor

@fdcavalcanti fdcavalcanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested internally and seems to have improved the issue.
Quick output from C3 and C6:

...
   0.00- 301.00 sec  227328000 Bytes    6.04 Mbits/sec
iperf exit
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
       160       160         0       136
...
   0.00- 301.00 sec  434552832 Bytes   11.55 Mbits/sec
iperf exit
nsh> 
nsh> cat /proc/iobinfo
    ntotal     nfree     nwait nthrottle
       160       160         0       136

No issues on Xtensa as well!

Very nice contribution @aviralgarg05 thank you!

@fdcavalcanti
Copy link
Copy Markdown
Contributor

@aviralgarg05 please remove those two commits on board since they have a net result of zero changes, then we can merge. This way we don't pollute the git history.

@aviralgarg05 aviralgarg05 force-pushed the fix/issue-16915-esp-wlan-rx-queue-lock branch from 2359678 to d275a8e Compare April 8, 2026 15:06
@fdcavalcanti fdcavalcanti merged commit 54ee939 into apache:master Apr 8, 2026
76 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: xtensa Issues related to the Xtensa architecture Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants