Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RADIOLIB_LORAWAN_NO_DOWNLINK when calling activateOTAA() with Tock setup #1140

Open
alistair23 opened this issue Jun 26, 2024 · 32 comments
Open
Labels
bug Something isn't working more information needed Further investigation is needed, not necessarily by the author

Comments

@alistair23
Copy link
Contributor

alistair23 commented Jun 26, 2024

Describe the bug

When trying to activate a LoRaWAN device I see the following failure

[2024-06-26 22:25:59.867] [SX1261] Initialising Radio ... 
[2024-06-26 22:25:59.995] RLB_PRO: Setting up fixed channels (subband 2)
[2024-06-26 22:25:59.998] RLB_PRO: mask[7] = 0x0002
[2024-06-26 22:26:00.001] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-26 22:26:00.007] RLB_PRO: mask[0] = 0xff00
[2024-06-26 22:26:00.009] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-26 22:26:00.013] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.021] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.025] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.034] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.040] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.046] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.052] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.058] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-26 22:26:00.374] RLB_PRO: [MAC] 0x03
[2024-06-26 22:26:00.374] RLB_PRO: 0000000 00 00 00 80                                     | ....              
[2024-06-26 22:26:00.383] RLB_PRO: LinkADRReq: dataRate = 0, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-26 22:26:00.395] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-26 22:26:00.397] RLB_PRO: [MAC] 0x04
[2024-06-26 22:26:00.397] RLB_PRO: 0000000 00                                              | .                 
[2024-06-26 22:26:00.406] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-26 22:26:00.409] RLB_PRO: [MAC] 0x05
[2024-06-26 22:26:00.412] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-26 22:26:00.420] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-26 22:26:00.433] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-26 22:26:00.436] RLB_PRO: [MAC] 0x08
[2024-06-26 22:26:00.436] RLB_PRO: 0000000 01                                              | .                 
[2024-06-26 22:26:00.444] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-26 22:26:00.447] RLB_PRO: [MAC] 0x09
[2024-06-26 22:26:00.450] RLB_PRO: 0000000 0d                                              | .                 
[2024-06-26 22:26:00.459] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 0, maxEirp = 30 dBm
[2024-06-26 22:26:00.464] RLB_PRO: [MAC] 0x0C
[2024-06-26 22:26:00.467] RLB_PRO: 0000000 65                                              | e                 
[2024-06-26 22:26:00.473] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-26 22:26:00.479] RLB_PRO: [MAC] 0x0F
[2024-06-26 22:26:00.481] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-26 22:26:00.490] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-26 22:26:00.493] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-26 22:26:00.809] RLB_PRO: 
[2024-06-26 22:26:00.809] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-26 22:26:00.846] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-26 22:26:02.403] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-06-26 22:26:02.406] RLB_PRO: 
[2024-06-26 22:26:02.409] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-26 22:26:02.447] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-26 22:26:07.376] RLB_PRO: Opening Rx1 window (190 ms timeout)... <-- Rx Delay end 
[2024-06-26 22:26:07.570] RLB_PRO: Closing Rx1 window
[2024-06-26 22:26:07.573] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-26 22:26:13.471] RLB_PRO: Opening Rx2 window (190 ms timeout)... <-- Rx Delay end 
[2024-06-26 22:26:13.664] RLB_PRO: Closing Rx2 window
[2024-06-26 22:26:13.680] activateOTAA failed, code -1116

[OP edited this log to add the timestamps as requested below]

To Reproduce
The full code is here: https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc

The main snippet is below

  // create a new instance of the HAL class
  TockHal* hal = new TockHal();
  int state;

  // now we can create the radio module
  // pinout corresponds to the SparkFun LoRa Thing Plus - expLoRaBLE
  // NSS pin:   0
  // DIO1 pin:  2
  // NRST pin:  4
  // BUSY pin:  1
  SX1262 tock_module = new Module(hal, RADIO_NSS, RADIO_DIO_1, RADIO_RESET, RADIO_BUSY);
  LoRaWANNode node(&tock_module, Region, subBand);

  // Setup the radio
  // The settings here work for the SparkFun LoRa Thing Plus - expLoRaBLE
  tock_module.XTAL = true;
  state = tock_module.begin(915.0);

  if (state != RADIOLIB_ERR_NONE) {
    printf("begin failed, code %d\r\n", state);
    return 1;
  }

  node.beginOTAA(joinEUI, devEUI, nwkKey, appKey);

  state = node.activateOTAA();

  if (state != RADIOLIB_LORAWAN_NEW_SESSION) {
    printf("activateOTAA failed, code %d\r\n", state);
    return 1;
  }

Expected behavior
https://github.com/jgromes/RadioLib/wiki/Troubleshooting-Guide seems to indicate that no downlink (-1116) is ok to ignore, but the actual examples check for it when running activateOTAA().

I would expect activateOTAA() to return RADIOLIB_LORAWAN_NEW_SESSION

Screenshots

Additional info (please complete):

  • MCU: Apollo3 (Using Tock)
  • Link to Arduino core: Tock
  • Wireless module type: SX1262
  • Arduino IDE version N/A
  • Library version: 6.6.0
@StevenCellist
Copy link
Collaborator

Hi @alistair23, thanks for including logs, but in LW, timestamps on logs are almost even more crucial than the log contents itself. So please, retry with millisecond-precision timestamps enabled.
What network server are you using? Are you sure there is a gateway within reach of the device? Can you show what is happening on the server's console?
LW is (unfortunately, although it's the charm of the tech) a heck of a lot more complicated than just having a correct configuration on the device, so please be as complete as possible!

@StevenCellist
Copy link
Collaborator

By the way, I will provide no help whatsoever as long as you don't decrease your uplink interval by orders of magnitude - you are breaching laws, and let's not even start about FUP if you are using TTN.

@alistair23
Copy link
Contributor Author

alistair23 commented Jun 26, 2024

Does LoRaWAN require more precise timing? It's possible that's an issue there. I'll get some more logs and update the PR.

How do I get ms level precision logs?

I'm connecting to a TTN network. The gatway is in the same room and I see events like this in the TTN console, so it is able to communicate with the gateway

image

@alistair23
Copy link
Contributor Author

By the way, I will provide no help whatsoever as long as you don't decrease your uplink interval by orders of magnitude - you are breaching laws, and let's not even start about FUP if you are using TTN.

I don't follow. I am not changing anything related to uplink intervals

@HeadBoffin
Copy link
Collaborator

Does LoRaWAN require more precise timing?

Yes, LW requires that the receive is turned on a few milliseconds before the Rx1 window which is 5 seconds from the end of the uplink. The receiver will listen for the preamble, if it misses it, game over.

But as @StevenCellist mentions, you are hammering a shared community resource provided free of charge once every 1 second which is not only illegal in most jurisdictions (because the law limits use of the shared spectrum) but is also in breach of the TTN Fair Use Policy. The ISM band is generally very busy - you are currently pretty much jamming parts of the airwaves so other people will likely see a degradation in their devices connectivity. No where in our examples do we send more than once every few minutes and we reference the FUP to boot.

And there is the no small matter that LoRaWAN works on the basis of an uplink then opening a window 5 seconds later and then again 1 second after that for a response. So apart from the law and FUP, LW can't work at the rate you are using.

@StevenCellist
Copy link
Collaborator

^^ thank you @HeadBoffin

How do I get ms level precision logs?

That depends on what terminal you are using. ArduinoIDE has a clock button, PlatformIO has a monitor_filter = time option, other terminals usually have some hidden option too.

@alistair23
Copy link
Contributor Author

Yes, LW requires that the receive is turned on a few milliseconds before the Rx1 window which is 5 seconds from the end of the uplink. The receiver will listen for the preamble, if it misses it, game over.

Hmmm... I'm probably missing that window.

Ah! For the delay in the loop I just checked in something to open this issue. It's just copied from something else.

The loop isn't currently being run, but I will be sure to increase the delay considerably

@alistair23
Copy link
Contributor Author

I have updated the original PR description to include timing information. The timing information comes from minicom and as the printing is async it isn't going to be 100% accurate.

I have also updated the WIP code at https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc to use a long delay. So it won't spam any thing

And there is the no small matter that LoRaWAN works on the basis of an uplink then opening a window 5 seconds later and then again 1 second after that for a response. So apart from the law and FUP, LW can't work at the rate you are usin

From a quick look I think I'm too slow for this window

@StevenCellist
Copy link
Collaborator

A few observations.

  1. You missed the details on sendReceive(): it is likely to return NO_DOWNLINK_RECEIVED as you observed from the troubleshooting guide, which is not ERR_NONE and accordingly your sketch still spams the airwaves at illegal intervals. "it is just copied from something else" - well, but as soon as it works and we don't say anything, it will end up like this in the Tock examples and we have folks all over the globe killing the free band as @HeadBoffin said.
  2. The Rx1 window is a bit early right on time, but something is up with Rx2 window being scheduled an additional six seconds later instead of after six seconds. Don't see why that would be the case.
  3. A mistake on my side is in the released 6.6.0 but solved in upstream: the JoinRequest datarate should be 2, not 0 (here). Won't solve the Rx2 window being scheduled incorrectly, but may result in a JoinAccept being transmitted which isn't clear from your screenshot which is cropped to unusable size. Anything visible on the TTN Messages tab can be safely shared - it shows none of the keys required to decrypt your messages. The other stuff is transmitted unencrypted over the air and visible to everyone anyway.

@StevenCellist
Copy link
Collaborator

Oh and by the way, please fix the printf not being able to handle floating values - it might not be the problem but definitely won't hurt.

@HeadBoffin
Copy link
Collaborator

Ah! For the delay in the loop I just checked in something to open this issue. It's just copied from something else.

Before TTI lose the will to live with this sort of stuff, can you tell us where you copied this from so we can try & purge it from the internet - it won't work with any LNS but it definitely hurts TTN. There's also trying to track down the how & why of having the Rx2 window extended by 5 seconds.

What is the RSSI & SNR of the Join Request? One of the standard gotchas is the gateway hearing the JR but the device not hearing the JA due to being too far away, or more often, too close!

Side note, you are mixing an issue with a PR - this is an issue, a PR would be you providing code to make a change - different tab on the main page!

@alistair23
Copy link
Contributor Author

alistair23 commented Jun 27, 2024

I think there has been some confusion, so I'll try to clarify.

This is an issue. I am getting a -1116 error from activateOTAA(), which I shouldn't be getting, hence the issue. I suspect the problem is on my side, not a problem in RadioLib, but I was hoping to get some help.

The code at https://github.com/alistair23/libtock-c/blob/alistair/lora-wan/examples/lora/sensor-lorawan/main.cc is just my local testing changes. I only checked it in to create this issue. I appreciate you pointing out the excess communication. I have fixed the delay to ensure it is run on every iteration of the loop. That code has never been run though

The code is based on the LoRa example (not LoRaWAN) from here: https://github.com/jgromes/RadioLib/blob/master/examples/NonArduino/Tock/main.cpp#L61

As for the Rx1 window, the Tock syscall interface usually results in delays, as the process of acquiring the current time and delays can take a few extra ms to occur. So I suspect we miss a window somehow. The printing is async so the times might not be correct. I'll try and dig into what is going on.

I'll try the latest from upstream to see if that helps, this is the full console output from TTN

image

@StevenCellist
Copy link
Collaborator

I think there has been some confusion, so I'll try to clarify.

We understand perfectly fine what is going on, but ...

That code has never been run though

Yeah, sure, I believe that for now, but that says nothing about what you will be doing with it once it does work. And since this is all about a shared resource being provided free of service, we care about these details. We don't solve problems here, we educate on apparently lots of missing knowledge. TTN provides a great Learn section which might prove very useful...

The code is based on the LoRa example

Very interesting that this provided code also breaks laws. How's your knowledge on dutycycles?

So I suspect we miss a window somehow.

An easy way to check this is by widening the scanGuard in downlinkCommon to e.g. 50ms - effectively meaning the device will start listening 50ms earlier as well as 100ms longer in total.

this is the full console output from TTN

There is no way for us to confirm whether this is the full output - what are the next lines? Does TTN even schedule a JoinAccept? We can't see that... what is it that you are so actively trying to hide from us? A naughty DeviceID?
And it also wouldn't hurt sending updated logs from the device after switching to upstream, just to make sure we're all on board.

Come on man - a little less coffee, a little more effort to work together and I'm sure we can get it working!

@alistair23
Copy link
Contributor Author

alistair23 commented Jun 27, 2024

I tried a scanGuard of 50 and I get this (plus I think I fixed the delay). Note that I have added a few extra prints.

[2024-06-27 19:31:23.624] RLB_PRO: mask[7] = 0x0002
[2024-06-27 19:31:23.627] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 19:31:23.633] RLB_PRO: mask[0] = 0xff00
[2024-06-27 19:31:23.635] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 19:31:23.639] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.647] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.651] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.660] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.665] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.671] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.677] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.683] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-27 19:31:23.982] RLB_PRO: [MAC] 0x03
[2024-06-27 19:31:23.982] RLB_PRO: 0000000 00 00 00 80                                     | ....              
[2024-06-27 19:31:23.991] RLB_PRO: LinkADRReq: dataRate = 0, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-27 19:31:24.002] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-27 19:31:24.005] RLB_PRO: [MAC] 0x04
[2024-06-27 19:31:24.005] RLB_PRO: 0000000 00                                              | .                 
[2024-06-27 19:31:24.013] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-27 19:31:24.016] RLB_PRO: [MAC] 0x05
[2024-06-27 19:31:24.019] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-27 19:31:24.027] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-27 19:31:24.039] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-27 19:31:24.042] RLB_PRO: [MAC] 0x08
[2024-06-27 19:31:24.042] RLB_PRO: 0000000 01                                              | .                 
[2024-06-27 19:31:24.051] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-27 19:31:24.054] RLB_PRO: [MAC] 0x09
[2024-06-27 19:31:24.056] RLB_PRO: 0000000 0d                                              | .                 
[2024-06-27 19:31:24.065] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 0, maxEirp = 30 dBm
[2024-06-27 19:31:24.071] RLB_PRO: [MAC] 0x0C
[2024-06-27 19:31:24.073] RLB_PRO: 0000000 65                                              | e                 
[2024-06-27 19:31:24.080] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-27 19:31:24.085] RLB_PRO: [MAC] 0x0F
[2024-06-27 19:31:24.088] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-27 19:31:24.096] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-27 19:31:24.100] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-27 19:31:24.397] RLB_PRO: 
[2024-06-27 19:31:24.397] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-27 19:31:24.432] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 19:31:25.985] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-06-27 19:31:25.988] RLB_PRO: this->rxDelayStart: 18999544 ms
[2024-06-27 19:31:25.994] RLB_PRO: now: 18999553 ms
[2024-06-27 19:31:25.997] RLB_PRO: this->rxDelays[0]: 5000 ms
[2024-06-27 19:31:26.000] RLB_PRO: 
[2024-06-27 19:31:26.000] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-27 19:31:26.036] RLB_PRO: PHY: SF = 12, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 19:31:26.070] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-06-27 19:31:26.073] RLB_PRO: mod->hal->millis(): 18999634 ms
[2024-06-27 19:31:26.076] RLB_PRO: waitLen 4856 ms
[2024-06-27 19:31:30.839] RLB_PRO: Opening Rx1 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:31.130] RLB_PRO: Closing Rx1 window
[2024-06-27 19:31:31.133] RLB_PRO: PHY: Frequency DL =  MHz
[2024-06-27 19:31:31.159] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-06-27 19:31:31.162] RLB_PRO: mod->hal->millis(): 19004845 ms
[2024-06-27 19:31:31.165] RLB_PRO: waitLen 645 ms
[2024-06-27 19:31:31.817] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:32.108] RLB_PRO: Closing Rx2 window
[2024-06-27 19:31:32.123] activateOTAA failed, code -1116

I tried the latest mainline and that doesn't even get this far. Something has changed recently that causes me to get RADIOLIB_ERR_DWELL_TIME_EXCEEDED instead. I haven't had a chance to debug that yet.

TTN forwards the join-accept, as you can see in the log. That's all of the lines on the TTN side. The next line is the next attempt at a different time. I'm not hiding anything

image

@StevenCellist
Copy link
Collaborator

I haven't had a chance to debug that yet.

Please do so, as this is something crucial. The point is that you are not allowed to join at SF12 in AU915 region due to the dwell time limitations of 400ms that I did not see earlier. I added the dwell time checks but also corrected the join-request to be sent at SF10 instead, which should work. I might have made a mistake there in some way, but we do need to fix that. Sending the JR at an appropriate SF may just as well be a reason why this is not working correctly. I am not sure how TTN handles this as Australia is pretty much other side of the world!

That's all of the lines on the TTN side

Sorry, I was mistaken on that one, I thought there was an extra line to show that a downlink occurred with a downward arrow. I stand corrected.

I think I fixed the delay

Definitely looks much better now, but the Rx1 and Rx2 windows are shifting forwards; how sure are you that your clock is stable and not drifting? CubeCells for example drift by 16 ms every 1000 ms, and the logs indicate something similar here. There are some instructions in BuildOpt here to figure this out. Please let me know what you find! And I like Big Logs, I cannot lie...

@HeadBoffin
Copy link
Collaborator

I am not sure how TTN handles this as Australia is pretty much other side of the world!

But you have access to the location warper 915 TTIG - so let the testing commence.

From the logs, and yes, precise timing issues etc etc, the Rx1 window was opened 4.854s - with a scan guard of 50ms that means it closed at 4.954s (50ms listening for preamble, 50ms early as per new setting) - you need to hit 5s. It may be time for an oscilloscope or digital signal logger to be used.

It would be prudent to get something working on a known good device / platform so you can see RadioLib in action with the logs showing what is good. Along the way it may fill in some of the details you need to wrestle with the whole LoRaWAN thing - it's like Shrek an onion, soooo many layers. The TTN Learn section is pretty much mandatory reading to make good progress - that's why they are mentioned so early in the most amazing starter notes ever.

@StevenCellist
Copy link
Collaborator

50ms listening for preamble

That's not exactly the case ;) it listens for as long as a preamble can take given the SF and BW.

[2024-06-27 19:31:30.839] RLB_PRO: Opening Rx1 window (290 ms timeout)... <-- Rx Delay end 
[2024-06-27 19:31:31.130] RLB_PRO: Closing Rx1 window

^ that's a 291 ms window of which 100 ms is due to scanGuard but 191 ms due to preamble.

@alistair23
Copy link
Contributor Author

For context this is using the latest (commit ce8e6fd)

[2024-06-27 20:06:58.216] RLB_PRO: mask[7] = 0x0002
[2024-06-27 20:06:58.219] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 20:06:58.224] RLB_PRO: mask[0] = 0xff00
[2024-06-27 20:06:58.227] RLB_PRO: UL:  65 1  (0 - 1082960896) | DL:   6 6  (0 - 0)
[2024-06-27 20:06:58.230] RLB_PRO: UL:   8 1  (1610612736 - 1082959462) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.239] RLB_PRO: UL:   9 1  (0 - 1082959872) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.243] RLB_PRO: UL:  10 1  (-1610612736 - 1082960281) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.251] RLB_PRO: UL:  11 1  (1073741824 - 1082960691) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.258] RLB_PRO: UL:  12 1  (-536870912 - 1082961100) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.264] RLB_PRO: UL:  13 1  (1610612736 - 1082961510) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.270] RLB_PRO: UL:  14 1  (0 - 1082961920) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.276] RLB_PRO: UL:  15 1  (-1610612736 - 1082962329) | DL:   0 5  (0 - 0)
[2024-06-27 20:06:58.574] RLB_PRO: [MAC] 0x03
[2024-06-27 20:06:58.574] RLB_PRO: 0000000 20 00 00 80                                     |  ...              
[2024-06-27 20:06:58.582] RLB_PRO: LinkADRReq: dataRate = 2, txSteps = 0, chMask = 0x0000, chMaskCntl = 0, nbTrans = 0
[2024-06-27 20:06:58.593] RLB_PRO: LinkADRAns: status = 0x07
[2024-06-27 20:06:58.596] RLB_PRO: [MAC] 0x04
[2024-06-27 20:06:58.596] RLB_PRO: 0000000 00                                              | .                 
[2024-06-27 20:06:58.605] RLB_PRO: DutyCycleReq: max duty cycle = 1/2^0
[2024-06-27 20:06:58.608] RLB_PRO: [MAC] 0x05
[2024-06-27 20:06:58.611] RLB_PRO: 0000000 08 68 e2 8c                                     | .h..              
[2024-06-27 20:06:58.619] RLB_PRO: RXParamSetupReq: rx1DrOffset = 0, rx2DataRate = 8, freq = 
[2024-06-27 20:06:58.631] RLB_PRO: RXParamSetupAns: status = 0x07
[2024-06-27 20:06:58.635] RLB_PRO: [MAC] 0x08
[2024-06-27 20:06:58.635] RLB_PRO: 0000000 01                                              | .                 
[2024-06-27 20:06:58.643] RLB_PRO: RXTimingSetupReq: delay = 1 sec
[2024-06-27 20:06:58.646] RLB_PRO: [MAC] 0x09
[2024-06-27 20:06:58.649] RLB_PRO: 0000000 1d                                              | .                 
[2024-06-27 20:06:58.658] RLB_PRO: TxParamSetupReq: dlDwell = 0, ulDwell = 1, maxEirp = 30 dBm
[2024-06-27 20:06:58.663] RLB_PRO: [MAC] 0x0C
[2024-06-27 20:06:58.666] RLB_PRO: 0000000 65                                              | e                 
[2024-06-27 20:06:58.671] RLB_PRO: ADRParamSetupReq: limitExp = 6, delayExp = 5
[2024-06-27 20:06:58.677] RLB_PRO: [MAC] 0x0F
[2024-06-27 20:06:58.680] RLB_PRO: 0000000 fa                                              | .                 
[2024-06-27 20:06:58.689] RLB_PRO: RejoinParamSetupReq: maxTime = 15, maxCount = 10
[2024-06-27 20:06:58.691] RLB_PRO: RejoinParamSetupAns: status = 0x03
[2024-06-27 20:06:58.988] RLB_PRO: 
[2024-06-27 20:06:58.988] RLB_PRO: PHY: Frequency UL =  MHz
[2024-06-27 20:06:59.024] RLB_PRO: PHY: SF = 10, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-06-27 20:06:59.047] activateOTAA failed, code -1114

@StevenCellist
Copy link
Collaborator

StevenCellist commented Jun 27, 2024

Thanks for the log, will have to figure out why that is the case... because the ToA is calculated for 23 bytes, which as per the air-time calculator (set at 13+10=23 bytes) should not exceed 400ms.. but that will require to dig deeper and see what values are being calculated.

If you can figure out what the stability of your clock is, I will check what is up with the dwell time calculation tomorrow.

@HeadBoffin
Copy link
Collaborator

What is the RSSI and SNR of the Join Request?

@StevenCellist
Copy link
Collaborator

Please pull from upstream - latest commit is tested on AU915/2 and joins perfectly fine on SF10 under dwell time limitations (forgot to convert microseconds to milliseconds).

@alistair23
Copy link
Contributor Author

Please pull from upstream - latest commit is tested on AU915/2 and joins perfectly fine on SF10 under dwell time limitations (forgot to convert microseconds to milliseconds).

Great! Thanks. That fixes the dwell time issue. Now back to the activateOTAA failed, code -1116 problem

I think the next step is to try and hookup a logic analyser and see if that provides any hints. I'll let you know when I have my breakout board setup

@StevenCellist
Copy link
Collaborator

The experts think there are two open questions awaiting your response, namely:

What is the RSSI and SNR of the Join Request?

... and ...

If you can figure out what the stability of your clock is

... which are both questions that easily supersede connecting a logic analyzer.

@alistair23
Copy link
Contributor Author

alistair23 commented Jun 28, 2024

What is the RSSI and SNR of the Join Request?

I see this in the TTN logs

    "settings": {
      "data_rate": {
        "lora": {
          "bandwidth": 125000,
          "spreading_factor": 10,
          "coding_rate": "4/5"
        }
      },
      "frequency": "918200000",
      "timestamp": 605731571,
      "time": "2024-06-28T12:13:26.750392913Z"
...
    "rx_metadata": [
      {
...
        "time": "2024-06-28T12:13:26.750392913Z",
        "timestamp": 605731571,
        "rssi": -49,
        "channel_rssi": -49,
        "snr": 13.5,
   ...
        "received_at": "2024-06-28T12:13:26.768107401Z"
      }
    ],
...
    "device_channel_index": 15,
    "consumed_airtime": "0.370688s"

and this in the gateway logs

Fri Jun 28 12:13:26 2024 daemon.info station[24842]: [S2E:VERB] RX 918.2MHz DR2 SF10/BW125 snr=13.5 rssi=-49 xtime=0xDB0006241ABAF3 - jreq MHdr=00 JoinEui=::0 DevEui=70b3:d57e:d006:8a59 DevNonce=0 MIC=860356858
Fri Jun 28 12:13:28 2024 daemon.info station[24842]: [S2E:DEBU] ::1 diid=35076 [ant#0] - next TX start ahead by 3s155ms (12:13:31.742494)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:VERB] ::1 diid=35076 [ant#0] - starting TX in 19ms835us: 927.5MHz 30.0dBm ant#0(0) DR10 SF10/BW500 frame=20ABFD19E6BFC963DBEC895D..0842BEE4 (33 bytes)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:INFO] TX ::1 diid=35076 [ant#0] - dntxed: 927.5MHz 30.0dBm ant#0(0) DR10 SF10/BW500 frame=20ABFD19E6BFC963DBEC895D..0842BEE4 (33 bytes)
Fri Jun 28 12:13:32 2024 daemon.info station[24842]: [S2E:DEBU] Tx done diid=35076

If you can figure out what the stability of your clock is

The hardware clock is stable, at least as accurately that I can measure. I suspect there are timing errors though as the print statements are async. I'm hoping I can get a better idea of the timing via a LA to see if I'm missing the window

@StevenCellist
Copy link
Collaborator

 "rssi": -49,
 "snr": 13.5,

A wall in between the device & gateway wouldn't hurt (and actually be better), but not completely catastrophical.

Regarding the async: yes I got that. But what I see in your logs, is that your clock is running too fast, because the Rx1 and Rx2 window are opening and closing increasingly earlier. And given the timeout that is printed being 290 ms and the prints occurring with 291 ms between them, that async stuff doesn't appear to impact timing that much. Really, I recommend investing whether your clock is running at the same speed as the rest of the world. Ideally, as instructed in BuildOpt.h, let your device print a character at a 1000ms interval for like a minute long, and report back what you see with timestamped logs.

@alistair23
Copy link
Contributor Author

A wall in between the device & gateway wouldn't hurt (and actually be better), but not completely catastrophical.

Yeah they are in the same room.

Regarding the async: yes I got that. But what I see in your logs, is that your clock is running too fast, because the Rx1 and Rx2 window are opening and closing increasingly earlier. And given the timeout that is printed being 290 ms and the prints occurring with 291 ms between them, that async stuff doesn't appear to impact timing that much. Really, I recommend investing whether your clock is running at the same speed as the rest of the world. Ideally, as instructed in BuildOpt.h, let your device print a character at a 1000ms interval for like a minute long, and report back what you see with timestamped logs.

Good point. I'll test and let you know

@StevenCellist StevenCellist added bug Something isn't working more information needed Further investigation is needed, not necessarily by the author labels Jun 28, 2024
@alistair23
Copy link
Contributor Author

This is what I am seeing

[2024-07-01 09:42:21.547] Starting
[2024-07-01 09:42:22.525] One second
[2024-07-01 09:42:23.502] One second
[2024-07-01 09:42:24.480] One second
[2024-07-01 09:42:25.457] One second
[2024-07-01 09:42:26.435] One second
[2024-07-01 09:42:27.412] One second
[2024-07-01 09:42:28.390] One second
[2024-07-01 09:42:29.367] One second
[2024-07-01 09:42:30.346] One second
[2024-07-01 09:42:31.323] One second
[2024-07-01 09:42:32.301] One second
[2024-07-01 09:42:33.278] One second
[2024-07-01 09:42:34.256] One second
[2024-07-01 09:42:35.234] One second
[2024-07-01 09:42:36.211] One second
[2024-07-01 09:42:37.189] One second
[2024-07-01 09:42:38.166] One second
[2024-07-01 09:42:39.144] One second
[2024-07-01 09:42:40.121] One second
[2024-07-01 09:42:41.099] One second
[2024-07-01 09:42:42.077] One second
[2024-07-01 09:42:43.054] One second
[2024-07-01 09:42:44.032] One second
[2024-07-01 09:42:45.009] One second
[2024-07-01 09:42:45.987] One second
[2024-07-01 09:42:46.964] One second
[2024-07-01 09:42:47.942] One second
[2024-07-01 09:42:48.919] One second
[2024-07-01 09:42:49.897] One second
[2024-07-01 09:42:50.874] One second
[2024-07-01 09:42:51.852] One second
[2024-07-01 09:42:52.830] One second
[2024-07-01 09:42:53.808] One second
[2024-07-01 09:42:54.785] One second
[2024-07-01 09:42:55.762] One second
[2024-07-01 09:42:56.740] One second
[2024-07-01 09:42:57.717] One second
[2024-07-01 09:42:58.695] One second
[2024-07-01 09:42:59.673] One second
[2024-07-01 09:43:00.650] One second
[2024-07-01 09:43:01.628] One second
[2024-07-01 09:43:02.605] One second
[2024-07-01 09:43:03.583] One second
[2024-07-01 09:43:04.560] One second
[2024-07-01 09:43:05.538] One second
[2024-07-01 09:43:06.516] One second
[2024-07-01 09:43:07.493] One second
[2024-07-01 09:43:08.471] One second
[2024-07-01 09:43:09.449] One second
[2024-07-01 09:43:10.427] One second
[2024-07-01 09:43:11.404] One second
[2024-07-01 09:43:12.382] One second
[2024-07-01 09:43:13.359] One second
[2024-07-01 09:43:14.337] One second
[2024-07-01 09:43:15.314] One second
[2024-07-01 09:43:16.292] One second
[2024-07-01 09:43:17.270] One second
[2024-07-01 09:43:18.247] One second
[2024-07-01 09:43:19.224] One second
[2024-07-01 09:43:20.202] One second
[2024-07-01 09:43:21.179] One second
[2024-07-01 09:43:22.157] One second

So that's a clock drift of about -23. I surprised that it's negative!

Setting RADIOLIB_CLOCK_DRIFT_MS to (-23) still gives me this

[2024-07-01 09:52:15.857] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 09:52:15.893] RLB_PRO: PHY: SF = 10, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-07-01 09:52:15.927] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-07-01 09:52:15.930] RLB_PRO: mod->hal->millis(): 164857 ms
[2024-07-01 09:52:15.933] RLB_PRO: waitLen 4857 ms
[2024-07-01 09:52:20.697] RLB_PRO: Opening Rx1 window (176 ms timeout)... <-- Rx Delay end 
[2024-07-01 09:52:20.877] RLB_PRO: Closing Rx1 window
[2024-07-01 09:52:20.880] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 09:52:20.906] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-07-01 09:52:20.909] RLB_PRO: mod->hal->millis(): 169956 ms
[2024-07-01 09:52:20.912] RLB_PRO: waitLen 758 ms
[2024-07-01 09:52:21.674] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-07-01 09:52:21.965] RLB_PRO: Closing Rx2 window
[2024-07-01 09:52:21.979] activateOTAA failed, code -1116

@StevenCellist
Copy link
Collaborator

Ahw, you're missing one line in the final log: the Rx Delay Start! Meanwhile, I'll have to check whether it should be positive or negative.. given that Rx2 now starts 25ms earlier than Rx1 so might have meant for the value to be positive this way round...

@alistair23
Copy link
Contributor Author

Sorry, full log is

[2024-07-01 21:17:43.386] RLB_PRO: JoinRequest sent (DevNonce = 0) <-- Rx Delay start
[2024-07-01 21:17:43.390] RLB_PRO: this->rxDelayStart: 120917 ms
[2024-07-01 21:17:43.395] RLB_PRO: now: 120927 ms
[2024-07-01 21:17:43.395] RLB_PRO: this->rxDelays[0]: 5000 ms
[2024-07-01 21:17:43.401] RLB_PRO: 
[2024-07-01 21:17:43.401] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 21:17:43.436] RLB_PRO: PHY: SF = 7, TX = 22 dBm, BW =  kHz, CR = 4/0
[2024-07-01 21:17:43.470] RLB_PRO: this->rxDelays[i]: 5000 ms
[2024-07-01 21:17:43.473] RLB_PRO: mod->hal->millis(): 121007 ms
[2024-07-01 21:17:43.476] RLB_PRO: waitLen 4856 ms
[2024-07-01 21:17:48.239] RLB_PRO: Opening Rx1 window (131 ms timeout)... <-- Rx Delay end 
[2024-07-01 21:17:48.376] RLB_PRO: Closing Rx1 window
[2024-07-01 21:17:48.378] RLB_PRO: PHY: Frequency DL =  MHz
[2024-07-01 21:17:48.404] RLB_PRO: this->rxDelays[i]: 6000 ms
[2024-07-01 21:17:48.407] RLB_PRO: mod->hal->millis(): 126059 ms
[2024-07-01 21:17:48.410] RLB_PRO: waitLen 804 ms
[2024-07-01 21:17:49.216] RLB_PRO: Opening Rx2 window (290 ms timeout)... <-- Rx Delay end 
[2024-07-01 21:17:49.508] RLB_PRO: Closing Rx2 window
[2024-07-01 21:17:49.523] activateOTAA failed, code -1116

@alistair23
Copy link
Contributor Author

This is the capture from a Saleae

LoRaWAN-Capture.zip

You should be able to open it with https://www.saleae.com/pages/downloads if you are interested

@StevenCellist
Copy link
Collaborator

I see that the clock drift compensation is not used, as you are using a different HAL. Please refer to the ArduinoHAL and implement these calculations in your own HAL (all four of them). Hopefully then the clock drift compensation works nicely.

I downloaded Saleae but it says I need a .sal file which is not in your zip.

@alistair23
Copy link
Contributor Author

Aw, sorry. Just rename the .zip to .sal. GitHub limits the extensions of uploads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working more information needed Further investigation is needed, not necessarily by the author
Projects
None yet
Development

No branches or pull requests

3 participants