Skip to content

Conversation

@fabian18
Copy link
Contributor

@fabian18 fabian18 commented Oct 12, 2025

Contribution description

Fix nrf52 driver after submac software ACK.

Testing procedure

gnrc_networking with any nrf52 board and any other 802.15.4 transceiver.

Issues/PRs references

Bug introduced with #21533
fixes #21782

@github-actions github-actions bot added Platform: ARM Platform: This PR/issue effects ARM-based platforms Area: cpu Area: CPU/MCU ports labels Oct 12, 2025
@Teufelchen1
Copy link
Contributor

This does appear to fix the issue on my NRF. I did not test the 802154 stack directly but I can see that the side-effect (my app not working at all the cpu is stuck in a busy loop in the 802154 stack) is gone and I can use riot as expected.

@Teufelchen1

This comment was marked as duplicate.

Copy link
Contributor

@mguetschow mguetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your proposed fix! Could you elaborate a bit on what the problem was and why it can be fixed that way? I'm not familiar with the nrf52 radio driver, but the changes look a bit like black magic to me.

@fabian18
Copy link
Contributor Author

Could you elaborate a bit on what the problem was and why it can be fixed that way?

I did not notice that there are 2 timers in the driver. One previously for sending an ACK and the other for IFS. I removed both.
I am not sure where it was stuck. I assume because I removed the timer_init but timer_set and timer_start were still called with _set_ifs_timer.

@crasbe crasbe added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 13, 2025
@riot-ci
Copy link

riot-ci commented Oct 13, 2025

Murdock results

✔️ PASSED

76391f8 cpu/native/socket_zep: native overhead workaround

Success Failures Total Runtime
10560 0 10560 13m:30s

Artifacts

@mguetschow
Copy link
Contributor

I am not sure where it was stuck. I assume because I removed the timer_init but timer_set and timer_start were still called with _set_ifs_timer.

Sounds plausible, although I would rather have expected an assertion failure in that case.

@fabian18
Copy link
Contributor Author

I tested an nrf52840dk with a samr21. The sam has an at86rf233 which supports hardware generated ACKs and retransmissions.
The issue is that the software ACK is sent too slow and (at least just 1) retransmission is triggered which causes a ping duplicate. If I set the retrans to 0 on the sam using `ifconfig, I no longer get duplicates.

@fabian18
Copy link
Contributor Author

There is also for example no duplicate when the nrf does not do CSMA for the ACK:

    case IEEE802154_FSM_EV_BH:
        if (false && !_does_handle_csma(dev)) {

@fabian18
Copy link
Contributor Author

Wait is this only a work around when no CSMA is supported by the driver? I dont know why to wait for a random backoff when sending an ACK

@fabian18
Copy link
Contributor Author

fabian18 commented Oct 20, 2025

When transmitting an ACK we have to wait aTurnaroundTime + SIFSPeriod but not a random backoff

@github-actions github-actions bot added Area: network Area: Networking Area: sys Area: System labels Oct 21, 2025
@fabian18
Copy link
Contributor Author

fabian18 commented Oct 21, 2025

I would propose to fix the ACK very simple like in the commit above to not delay the release.
Actually it depends on modulation parameters and I would like to raise attention for #21527: https://github.com/RIOT-OS/RIOT/pull/21527/files#top where I could implement it properly.

@mguetschow
Copy link
Contributor

I would propose to fix the ACK very simple like in the commit above to not delay the release. Actually it depends on modulation parameters and I would like to raise attention for #21527: #21527 (files) where I could implement it properly.

That's properly the right way to go, fine with me. But could you please add a todo to the respective part in the code that will need to be adapted in the follow-up PR so we don't loose track of it?

As said before, I'd appreciate some git history cleaning separating left-over fixes from the actual fix with the timers. And this is not supposed to be a draft anymore, right?

I've tested ICMP pings between nrf52840dk and adafruit-feather-nrf52840-sense and can confirm it is working for me with these changes. I am not knowledgeable enough in the radio code to assess the changes and give a confident ACK. Maybe @jia200x could have a quick look?

@fabian18 fabian18 force-pushed the pr/fix_nrf52_after_soft_ack branch from a83212c to 2a32d82 Compare October 22, 2025 18:16
@fabian18 fabian18 marked this pull request as ready for review October 22, 2025 18:16
@benpicco
Copy link
Contributor

benpicco commented Oct 26, 2025

I do now see duplicate responses to a multicast ping with socket_zep

2025-10-26 14:16:22,957 # ping ff02::1
2025-10-26 14:16:22,962 # 12 bytes from fe80::ec2e:ab90:ed92:d8b4%7: icmp_seq=0 ttl=64 rssi=-27 dBm time=4.932 ms
2025-10-26 14:16:22,964 # 12 bytes from fe80::ec2e:ab90:ed92:d8b4%7: icmp_seq=0 ttl=64 rssi=-27 dBm time=7.241 ms (DUP!)
2025-10-26 14:16:23,962 # 12 bytes from fe80::ec2e:ab90:ed92:d8b4%7: icmp_seq=1 ttl=64 rssi=-26 dBm time=4.487 ms
2025-10-26 14:16:23,964 # 12 bytes from fe80::ec2e:ab90:ed92:d8b4%7: icmp_seq=1 ttl=64 rssi=-26 dBm time=6.542 ms (DUP!)
2025-10-26 14:16:24,963 # 12 bytes from fe80::ec2e:ab90:ed92:d8b4%7: icmp_seq=2 ttl=64 rssi=-28 dBm time=5.187 ms
2025-10-26 14:16:24,964 # 
2025-10-26 14:16:24,964 # --- ff02::1 PING statistics ---
2025-10-26 14:16:24,964 # 3 packets transmitted, 3 packets received, 2 duplicates, 0% packet loss
2025-10-26 14:16:24,964 # round-trip min/avg/max = 4.487/5.677/7.241 ms

This is with two instances of examples/networking/gnrc/gnrc_networking with USE_ZEP=1 and sudo dist/tools/zep_dispatch/bin/zep_dispatch -w wpan0 ::1 17754

For the (optional) wpan0 you'll need sudo modprobe mac802154_hwsim, that allows you to capture the traffic in Wireshark

image

Looks fine on master

2025-10-26 14:21:24,387 # ping ff02::1
2025-10-26 14:21:24,391 # 12 bytes from fe80::a475:1fd:cfac:bed6%7: icmp_seq=0 ttl=64 rssi=-25 dBm time=3.794 ms
2025-10-26 14:21:25,392 # 12 bytes from fe80::a475:1fd:cfac:bed6%7: icmp_seq=1 ttl=64 rssi=-26 dBm time=4.512 ms
2025-10-26 14:21:26,393 # 12 bytes from fe80::a475:1fd:cfac:bed6%7: icmp_seq=2 ttl=64 rssi=-26 dBm time=4.592 ms
2025-10-26 14:21:26,393 # 
2025-10-26 14:21:26,393 # --- ff02::1 PING statistics ---
2025-10-26 14:21:26,394 # 3 packets transmitted, 3 packets received, 0% packet loss
2025-10-26 14:21:26,394 # round-trip min/avg/max = 3.794/4.299/4.592 ms

@fabian18
Copy link
Contributor Author

I opened the two gnrc_networking with USE_ZEP=1 and sudo modprobe mac802154_hwsim
I have 2 wpans:

7: wpan0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 123 qdisc fq_codel state UNKNOWN group default qlen 300
    link/ieee802.15.4 72:cd:62:fd:89:b4:00:d4 brd ff:ff:ff:ff:ff:ff:ff:ff
8: wpan1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 123 qdisc fq_codel state UNKNOWN group default qlen 300
    link/ieee802.15.4 56:de:b5:25:34:25:36:bc brd ff:ff:ff:ff:ff:ff:ff:ff

But what is the command I have to use for zep_dispatch?

$ sudo ./dist/tools/zep_dispatch/start_network.sh -w wpan0 ::1 17754
Cannot find device "17754"
error: invalid interface "17754"
sudo: wpan0: command not found
Cleaning up...
./dist/tools/zep_dispatch/start_network.sh: line 32: kill: (25146) - No such process

@fabian18
Copy link
Contributor Author

Hm what could it be ...

I tried to reproduce with border router and gnrc networking as in the original soft ACK PR.
I dont get DUP pongs.

12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=86 ttl=64 rssi=-27 dBm time=3.198 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=87 ttl=64 rssi=-26 dBm time=3.384 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=88 ttl=64 rssi=-27 dBm time=3.534 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=89 ttl=64 rssi=-26 dBm time=3.547 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=90 ttl=64 rssi=-28 dBm time=3.397 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=91 ttl=64 rssi=-26 dBm time=3.422 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=92 ttl=64 rssi=-27 dBm time=3.191 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=93 ttl=64 rssi=-27 dBm time=3.269 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=94 ttl=64 rssi=-28 dBm time=3.336 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=95 ttl=64 rssi=-25 dBm time=3.404 ms
uhcp_client(): no reply received
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=96 ttl=64 rssi=-28 dBm time=3.305 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=97 ttl=64 rssi=-27 dBm time=3.331 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=98 ttl=64 rssi=-27 dBm time=3.595 ms
12 bytes from fe80::a874:6b8:baff:53a1%7: icmp_seq=99 ttl=64 rssi=-27 dBm time=3.410 ms

--- ff02::1%7 PING statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 3.191/3.424/3.824 ms

@fabian18
Copy link
Contributor Author

Do we have to simulate air time? Would the transmission delay else be almost 0?

    /* delay transmission to simulate airtime */
    zepdev->ack_timer.callback = _send_frame;
    ztimer_set(ZTIMER_USEC, &zepdev->ack_timer, time_tx);

@benpicco
Copy link
Contributor

But what is the command I have to use for zep_dispatch?

You can just run

sudo dist/tools/zep_dispatch/bin/zep_dispatch -w wpan0 ::1 17754

Do we have to simulate air time? Would the transmission delay else be almost 0?

That might be it!

@fabian18
Copy link
Contributor Author

fabian18 commented Oct 29, 2025

Without the simulated airtime there is not DUP anymore as expected.

> ping ff02::1
2025-10-29 14:56:11,642 # ping ff02::1
2025-10-29 14:56:11,644 # 12 bytes from fe80::e45f:343:e58a:c0ac%7: icmp_seq=0 ttl=64 rssi=-26 dBm time=1.658 ms
2025-10-29 14:56:12,645 # 12 bytes from fe80::e45f:343:e58a:c0ac%7: icmp_seq=1 ttl=64 rssi=-28 dBm time=2.047 ms
2025-10-29 14:56:13,645 # 12 bytes from fe80::e45f:343:e58a:c0ac%7: icmp_seq=2 ttl=64 rssi=-28 dBm time=2.095 ms
2025-10-29 14:56:13,645 # 
2025-10-29 14:56:13,646 # --- ff02::1 PING statistics ---
2025-10-29 14:56:13,646 # 3 packets transmitted, 3 packets received, 0% packet loss
2025-10-29 14:56:13,646 # round-trip min/avg/max = 1.658/1.933/2.095 ms

So can I just drop it or do I have to create and opt-in pseudomodule?

@benpicco
Copy link
Contributor

Do you mean remove the simulated airtime for ACKs on socket_zep?
Yea if this is causing trouble, just drop it.

@fabian18
Copy link
Contributor Author

Only for ACK? I intended in general.

@benpicco
Copy link
Contributor

Hm do you think it would be better to add this to zep_dispatcher then?
Simulating the timing behavior seems kind of important to me.

@fabian18
Copy link
Contributor Author

I don`t understand what the difference would be when the artificial delay is moved to the dispatch program.
I would suspect that the dispatch program adds a little more delay which is enough to trigger an occasional retransmission, because DUPs are not arriving in the border router + gnrc networking example.

@benpicco
Copy link
Contributor

benpicco commented Oct 29, 2025

The Problem seems to be that socket_zep::request_op(IEEE802154_HAL_OP_SET_IDLE) gets called immediately after socket_zep::request_transmit() - this stops the send timer and aborts the transmission. I added a abort TX print before the ztimer_remove():

2025-10-29 18:22:52,086 # socket_zep::request_transmit(63 bytes, 496 µs)
2025-10-29 18:22:52,086 # socket_zep::request_op: switch to IDLE from 0
2025-10-29 18:22:52,086 # abort TX
2025-10-29 18:22:52,086 # socket_zep::request_op: switch to IDLE from 1
2025-10-29 18:22:52,086 # abort TX
2025-10-29 18:22:52,086 # socket_zep::request_transmit(77 bytes, 720 µs)
2025-10-29 18:22:52,086 # socket_zep::request_op: switch to IDLE from 0
2025-10-29 18:22:52,086 # abort TX
2025-10-29 18:22:54,964 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:54,964 # abort TX
2025-10-29 18:22:54,964 # socket_zep::request_op: switch to IDLE from 1
2025-10-29 18:22:54,964 # abort TX
2025-10-29 18:22:54,964 # socket_zep::request_transmit(70 bytes, 608 µs)
2025-10-29 18:22:54,966 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:54,966 # abort TX
2025-10-29 18:22:55,964 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:55,964 # abort TX
2025-10-29 18:22:55,964 # socket_zep::request_op: switch to IDLE from 1
2025-10-29 18:22:55,964 # abort TX
2025-10-29 18:22:55,964 # socket_zep::request_transmit(70 bytes, 608 µs)
2025-10-29 18:22:55,966 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:55,966 # abort TX
2025-10-29 18:22:56,964 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:56,964 # abort TX
2025-10-29 18:22:56,965 # socket_zep::request_op: switch to IDLE from 1
2025-10-29 18:22:56,965 # abort TX
2025-10-29 18:22:56,965 # socket_zep::request_transmit(70 bytes, 608 µs)
2025-10-29 18:22:56,966 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:56,966 # abort TX
2025-10-29 18:22:56,966 # socket_zep::request_transmit(70 bytes, 608 µs)
2025-10-29 18:22:56,967 # socket_zep::request_op: switch to IDLE from 2
2025-10-29 18:22:56,967 # abort TX
2025-10-29 18:23:01,591 # socket_zep::request_op: switch to IDLE from 1
2025-10-29 18:23:01,591 # abort TX
2025-10-29 18:23:01,591 # socket_zep::request_transmit(77 bytes, 720 µs)
2025-10-29 18:23:01,591 # socket_zep::request_op: switch to IDLE from 0
2025-10-29 18:23:01,591 # abort TX

@fabian18
Copy link
Contributor Author

fabian18 commented Oct 29, 2025

The TX state would be 3, but I only see the switch to IDLE from 0 1 2.
I put an assert(!ztimer_is_set(ZTIMER_USEC, &zepdev->ack_timer)); in front of where the ztimer is removed.

I get 2 duplicates but no assert trigger on either side..

2025-10-29 19:01:19,167 # 
2025-10-29 19:01:19,168 # --- ff02::1 PING statistics ---
2025-10-29 19:01:19,168 # 100 packets transmitted, 100 packets received, 2 duplicates, 0% packet loss
2025-10-29 19:01:19,168 # round-trip min/avg/max = 3.210/3.747/6.096 ms

@benpicco
Copy link
Contributor

How about

--- i/cpu/native/socket_zep/socket_zep.c
+++ w/cpu/native/socket_zep/socket_zep.c
@@ -516,9 +516,16 @@ static int _request_transmit(ieee802154_dev_t *dev)
 
     dev->cb(dev, IEEE802154_RADIO_INDICATION_TX_START);
 
-    /* delay transmission to simulate airtime */
-    zepdev->ack_timer.callback = _send_frame;
-    ztimer_set(ZTIMER_USEC, &zepdev->ack_timer, time_tx);
+    /* native overhead prevents short timers from triggering in time, send directly if delay is less than 200 µs */
+    if (time_tx <= 200) {
+        _send_frame(zepdev->ack_timer.arg);
+    } else {
+        time_tx -= 200;
+
+        /* delay transmission to simulate airtime */
+        zepdev->ack_timer.callback = _send_frame;
+        ztimer_set(ZTIMER_USEC, &zepdev->ack_timer, time_tx);
+    }
 
     return 0;
 }

It's hacky, but native is not anywhere close to real-time behavior so…

@github-actions github-actions bot added the Platform: native Platform: This PR/issue effects the native platform label Oct 30, 2025
@benpicco benpicco added this pull request to the merge queue Oct 31, 2025
Merged via the queue into RIOT-OS:master with commit 0995c46 Oct 31, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: cpu Area: CPU/MCU ports Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Platform: native Platform: This PR/issue effects the native platform Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken IEEE820.15.4 Networking on nrf52 Board

6 participants