Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ims/volte stops working after expiry of timers (4g setup) #322

Open
Hoernchen opened this issue May 21, 2024 · 12 comments
Open

ims/volte stops working after expiry of timers (4g setup) #322

Hoernchen opened this issue May 21, 2024 · 12 comments

Comments

@Hoernchen
Copy link

Volte calls between phones work fine after starting the containers, but after 1-2 minutes one of the configured expiry timers in kamailio_pcscf.cfg appears to trigger something and at that point volte calls stop working.

my current, very hacky "fix":

 modparam("ims_registrar_pcscf", "is_registered_fallback2ip", 1)
 modparam("ims_registrar_pcscf", "ignore_reg_state", 1)
 modparam("ims_registrar_pcscf", "ignore_contact_rxport_check", 1)
-modparam("ims_registrar_pcscf", "pending_reg_expires", 30)
-modparam("ims_registrar_pcscf", "subscription_expires", 36000)
-modparam("ims_registrar_pcscf", "delete_delay", CONTACT_DELETE_DELAY)
-modparam("ims_usrloc_pcscf", "expires_grace", 120)
+modparam("ims_registrar_pcscf", "pending_reg_expires", 99999)
+modparam("ims_registrar_pcscf", "subscription_expires", 99999)
+modparam("ims_registrar_pcscf", "delete_delay", 5)
+modparam("ims_usrloc_pcscf", "expires_grace", 9999999)
 
 #!ifdef WITH_REGINFO
 modparam("ims_registrar_pcscf", "subscribe_to_reginfo", 1)
@@ -428,7 +428,7 @@ modparam("ims_qos", "confirmed_qosrelease_headers", "X-Reason: QoS failed\r\n")
 modparam("ims_qos", "authorize_video_flow", 1)
 modparam("ims_qos", "af_signaling_ip", RX_AF_SIGNALING_IP)
 modparam("ims_qos", "include_rtcp_fd", 1)
-modparam("ims_qos", "rx_auth_expiry", 36000)
+modparam("ims_qos", "rx_auth_expiry", 999999999)
 modparam("ims_qos", "recv_mode", 1)
 modparam("ims_qos", "dialog_direction", RX_IMS_REG_DIALOG_DIRECTION)
 #!endif
@herlesupreeth
Copy link
Owner

Those diff does not make sense. Please post a pcap of the issue you are facing and I can take a look.

@Hoernchen
Copy link
Author

Which parts do you need? It appears to be a pcscf issue, only half of the deregistration happens, the other half is perpertually stuck in a pending state, while the "valid" contacts on the scscf side keep piling up with the expiry supplied by the ue, which is 600000 seconds.

Maybe related: kamailio/kamailio#3570

@herlesupreeth
Copy link
Owner

herlesupreeth commented May 26, 2024

pending_reg_expires

The above modparam is related to UEs whose IMS registration remain in pending state and not for UEs who register with the IMS successfully i.e. 200 OK for SIP REGISTER is received.

subscription_expires

This is when the subscription of the UE expires at P-CSCF. 36000 equates to 10hrs

delete_delay

This is the time after which IPSec connection is removed once the UE de-registers from the network

so I am not sure its a P-CSCF issue

Which parts do you need?

I suggest taking the pcap on the 'any' interface without any filters. The scenario should be as follows

  1. Start the trace
  2. Register two UEs
  3. Wait for 1-2 mins
  4. Attempt a call
  5. Stop the trace and post it here

@Hoernchen
Copy link
Author

You need to remove the ".zip" file suffix, zip was way too large to attach, so i had to use zstd.
broken_after_3min.pcapng.zst.zip

@herlesupreeth
Copy link
Owner

The packets in the pcap file didnt feel to be in right order I saw diameter UAR request before even receiving SIP REGISTER. Can you please ensure that time stamps on the machine running EPC + IMS and UE are the same? And, UE is set to receive time from the network?

Also, can you tell me which eNB you are using?

Sorry, cant be of much help on this. But all I can confirm is that its not due to those values in modparam you are facing this issue.

@Hoernchen
Copy link
Author

Just increasing pending_reg_expires is enough to "fix" the issue, for some reason waiting longer than (pending_reg_expires + expires_grace) appears to be time stuff starts failing. It can also be "fixed" by putting the phones in airplane mode, and turning them on again (unitl it fails again). No idea what is going on with the timestamps tbh.. I usually don't look at all interfaces because I don't want to see everything five times, so I never noticed that ;)

@herlesupreeth
Copy link
Owner

Can you post a pcap of the working call scenario with increased pending reg expire timer?

@Hoernchen
Copy link
Author

enb is srsenb, btw.

One "fixed" attempt:
pending_9999999_callfix.pcapng.zst.zip

And another try, I can basically watch it fail really fast even before making a call by doing this:

--- a/pcscf/kamailio_pcscf.cfg
+++ b/pcscf/kamailio_pcscf.cfg
@@ -309,7 +309,7 @@ modparam("uac","restore_mode","none")
 
 # ----------------- Settings for RTimer ---------------
 # time interval set to 60 seconds
-modparam("rtimer", "timer", "name=NATPING;interval=60;mode=1;")
+modparam("rtimer", "timer", "name=NATPING;interval=2;mode=1;")
 modparam("rtimer", "exec", "timer=NATPING;route=NATPING")
 #!endif
 
@@ -398,7 +398,7 @@ modparam("ims_registrar_pcscf", "ignore_contact_rxport_check", 1)
 modparam("ims_registrar_pcscf", "pending_reg_expires", 30)
 modparam("ims_registrar_pcscf", "subscription_expires", 36000)
 modparam("ims_registrar_pcscf", "delete_delay", CONTACT_DELETE_DELAY)
-modparam("ims_usrloc_pcscf", "expires_grace", 120)
+modparam("ims_usrloc_pcscf", "expires_grace", 12)
 modparam("ims_usrloc_pcscf", "timer_interval", 2)

The point at which a call will fail is basically as soon as the OPTIONS "nat ping" breaks, which is pretty much after 30+12s after the succesful registration according to the second pcap (below), pcscf log for that point in time:

pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf      | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf      | 104(141) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 200, Type 1
pcscf      | 85(122) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 1
pcscf      | 94(131) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 17
pcscf      | 94(131) INFO: cdp [authstatemachine.c:463]: auth_client_statefull_sm_process(): state machine: AUTH_EV_RECV_STA about to clean up
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf      | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf      | 107(144) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 200, Type 1
pcscf      | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 408, Type 2
pcscf      | 85(122) ERROR: <script>:   request sent to sip:[email protected]:40643: Fail Counter is 1
pcscf      | 85(122) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 1
pcscf      | 94(131) INFO: cdp [authstatemachine.c:292]: auth_client_statefull_sm_process(): after callback of event 17
pcscf      | 94(131) INFO: cdp [authstatemachine.c:463]: auth_client_statefull_sm_process(): state machine: AUTH_EV_RECV_STA about to clean up
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf      | 85(122) ERROR: <script>: request sent to sip:[email protected]:40643 completed with code: 408, Type 2
pcscf      | 85(122) ERROR: <script>:   request sent to sip:[email protected]:40643: Fail Counter is 2
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:41786 via sip:192.168.101.3:41786;transport=tcp...
pcscf      | 98(135) ERROR: <script>: OPTIONS to sip:[email protected]:40643 via sip:192.168.101.2:40643;transport=tcp...
pcscf      | 85(122) ERROR: <script>: request sent to sip:[email protected]:41786 completed with code: 408, Type 2
pcscf      | 85(122) ERROR: <script>:   request sent to sip:[email protected]:41786: Fail Counter is 1

fail_natping2_grace12.pcapng.zst.zip

@herlesupreeth
Copy link
Owner

Hey @Hoernchen , thank you very much for details pcaps and logs. I think its a bug in kamailio P-CSCF code. I will see whether I can fix it. In my opinion, wrong Contact of the UE is deleted after pending expires + expire grace.

@herlesupreeth
Copy link
Owner

@Hoernchen Please pull the latest changes from this repo and re-build kamailio images and give it a try (no modifications to expires values is required).

@Hoernchen
Copy link
Author

I have been watching and "secretly" using your branch since tuesday, and can confirm that the current repo appears to have fixed the issue and calls just keep working, thanks for your very fast fix!

Just one minor complaint: the new much lower nat ping leads to a lot of "spam" in the docker compose output, it would be great to suppress that in some way unless it fails, because even with just two active phones I kind of mostly just see option messages and have to scroll forever to find anything...

@herlesupreeth
Copy link
Owner

Thanks for confirming the fix.

Just one minor complaint: the new much lower nat ping leads to a lot of "spam" in the docker compose output

Point noted. I will fix the interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants