graceful-restart: Prefix not advertised to non-gr peer #2596

rkojedzinszky · 2022-10-31T20:55:45Z

History: cloudnativelabs/kube-router#1389

I have a gobgp instance configured for graceful restart, and an external peer without graceful restart capability. The symptom is that when gobgp is started, the bgp session gets established, howewer, no prefixes get advertised to the neighbor. If then I restart the peer, prefixes get advertised.

I wrote a sample source which demonstrates this.

gobgp.zip

Note that graceful restart is configured for ipv4 only. Then, when bgp is established, I get the following log entries:

DEBU[0000] Now syncing, suppress sending updates. start deferral timer  Duration=10 Key=192.168.111.21 Topic=Server
DEBU[0001] received update                               Key=192.168.111.21 Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0001] EOR received                                  AddressFamily=ipv4-unicast Key=192.168.111.21 Topic=Peer
INFO[0001] sync finished                                 Topic=Server

Then, all address families receive EOR (https://github.com/osrg/gobgp/blob/master/pkg/server/server.go#L1699). But when proceeding, it turns out that p.isGracefulRestartEnabled() returns false, thus not sending out updates.

Kube-router until v1.5.1 configured graceful restart for both ipv4 and ipv6. That made the deferral-timeout trigger, and then a different code path had sent out advertisements, howewer only after the timeout.

Once the peer is restarted, then the whole FSM knows about the peer is not configured for graceful restart, and it gets advertisements as soon the session is up.

For the peer a simple frr configuration is enough:

router bgp 10
 no bgp ebgp-requires-policy
 bgp graceful-restart-disable
 neighbor 192.168.111.1 remote-as 20

The text was updated successfully, but these errors were encountered:

rkojedzinszky · 2022-12-21T08:32:48Z

ping

aauren · 2023-06-11T22:54:51Z

@fujita - Any chance you could take a look at this one?

@rkojedzinszky has done quite a bit of work tracking down this seeming inconsistency with GoBGP's handling of GracefulRestart when GoBGP is configured for GracefulRestart, but the peer is not (see cloudnativelabs/kube-router#1389 for the full history and context). This is causing them quite a bit of a headache as anytime kube-router is restarted they lose all routes until they also restart their peer.

fujita · 2023-06-12T12:32:48Z

This is a regression(old version gobgp worked)?

rkojedzinszky · 2023-06-12T13:15:51Z

This is a regression(old version gobgp worked)?

I dont know, as I was using kube-router only. I suspect a change in kube-router triggered this bug. Earlier, a bug in kube-router masked or changed behavior in a way that it seemed that everything is ok, but it turns out that the applied change in kube-router is correct (cloudnativelabs/kube-router#1327).

aauren · 2023-06-12T16:06:57Z

Elaborating just a bit in this thread (obviously all work can be found in the PR that @rkojedzinszky linked)...

Previously kube-router used to announce graceful restart for the gobgpapi.Family_AFI_IP and gobgpapi.Family_AFI_IP6 AFI families whether we used them or not. As a matter of fact, at the point in time that @rkojedzinszky linked to, the BGP implementation in kube-router was only capable of working with IPv4 or IPv6 and could not work with both at the same time.

Some BGP peer implementations were fine ignoring an AFI setup for a family that wasn't used, but other ones it caused breakages unless users configured an AFI group that they weren't using and usually weren't capable of using.

However, this also seems to have worked around a bug (?) in GoBGP that @rkojedzinszky documented in the description of this issue up above.

fujita · 2023-06-14T06:24:01Z

Hmm, seems that I can't reproduce this.

btw, in the sample code, why LocalRestarting is set to true? It should not.

rkojedzinszky · 2023-06-14T11:45:26Z

@fujita I have created my sample code based on kube-router behavior. I will need some time to pick this up again, I'll report as soon as I can.

rkojedzinszky · 2023-06-25T09:30:52Z

Hmm, seems that I can't reproduce this.

btw, in the sample code, why LocalRestarting is set to true? It should not.

@fujita

I could reproduce it again. So use the attached code, and make sure to use the following frr configartion:

# sh running-config 
Building configuration...

Current configuration:
!
frr version 8.4.2
frr defaults traditional
hostname frr
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 10
 bgp graceful-restart-disable
 neighbor 192.168.111.1 remote-as 20
 !
 address-family ipv4 unicast
  neighbor 192.168.111.1 route-map bgp-default in
  neighbor 192.168.111.1 route-map bgp-default out
 exit-address-family
exit
!
route-map bgp-default permit 10
exit
!
end

Pay attention to the bgp graceful-restart-disable statement. Ensure that the sample code is started after frr. Then I observed that after 3 mins there are no advertisements:

frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 8
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        42        41        0    0    0 00:03:31            0        0 N/A

Total number of neighbors 1

Howewer, if I make a clear ip bgp on frr, then the sessions gets established again, and then prefixes gets advertised immediately:

frr# clear ip bgp *
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        49        0    0    0 00:00:02         Idle        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        50        0    0    0 00:00:05       Active        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        50        0    0    0 00:00:12       Active        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 11
RIB entries 1, using 192 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        49        52        0    0    0 00:00:01            1        0 N/A

Total number of neighbors 1

This behavior is consistent with cisco ios implemetation too, we had never had issues with frr-cisco setups with gr enabled on one end and disabled on the other.

fujita · 2023-06-28T05:54:11Z

I can reproduce the issue. The following change fixed the issue.

diff --git a/main.go b/main.go
index 87bb682..eea63c7 100644
--- a/main.go
+++ b/main.go
@@ -91,7 +91,7 @@ func main() {
                                Enabled:         true,
                                RestartTime:     120,
                                DeferralTime:    10,
-                               LocalRestarting: true,
+                               // LocalRestarting: true,
                        },
                        AfiSafis: []*gobgpapi.AfiSafi{

rkojedzinszky · 2023-06-28T11:10:15Z

@aauren

I created my own example code based on kube-router, where LocalRestarting is also set to true. I think there is a good reason why it is done that way, howewer, I am not really familiar with such depths of bgp/gobgp fsm. Can you please comment on it?

aauren · 2023-06-28T19:44:16Z

@rkojedzinszky This was unfortunately way before my time with kube-router. It was introduced with the graceful-restart feature here: https://github.com/cloudnativelabs/kube-router/pull/220/files#diff-d0dae64b5424393b01d606bfcabfa3e8fd82d2c466eaeb28187a4da833105273R504 and no description was given as to why it was added.

@fujita Thanks for tracking this down! We really appreciate your time and effort! I found a bit of code documentation about this feature here: https://github.com/osrg/gobgp/blob/master/internal/pkg/config/bgp_configs.go#L4246-L4252

However, I'm still not certain that I understand the implications of setting or not setting it and without understanding it better I feel hesitant to change a setting that has been in the graceful restart implementation since day 1 with kube-router.

Is there any chance that you would be willing to clarify what the option does and what the impact to it being set to true or false is?

rkojedzinszky · 2023-06-28T20:52:35Z

@fujita Also please note that in the sample code if you enable IPv6 address family too, and follow the same sequence of starting the speakers, then as commented in the code, a timer will fire soon, and advertisements will begin to be sent. This is with LocalRestarting: true. So, it seems to be an inconsistent behavior.

fujita · 2023-06-28T23:16:25Z

LocalRestarting is configuration for a restarting speaker:
https://github.com/osrg/gobgp/blob/master/docs/sources/graceful-restart.md#restarting-speaker

should not be set in your case.

I can't explain the behavior with IPv6 enabled. Needs to investigate.

aauren · 2023-06-29T03:43:59Z

@fujita Thanks again for continuing to pursue this with us.

You mentioned that you didn't think that LocalRestarting should be set in our case. And in the case of @rkojedzinszky's example code, I think I understand what you mean because it is starting for the first time, and yet it is still setting the LocalRestarting value. However, I'm not sure if it would actually be good to disable it for the kube-router case.

First off, to ensure that I understand what this flag does correctly from your documentation, is it correct to say that it lets it's peers know that it is recovering from a restart by setting the restart bit. And that this is the same procedure that is outlined by RFC4724 here:

To re-establish the session with its peer, the Restarting Speaker
MUST set the "Restart State" bit in the Graceful Restart Capability
of the OPEN message. Unless allowed via configuration, the
"Forwarding State" bit for an address family in the capability can be
set only if the forwarding state has indeed been preserved for that
address family during the restart.

If so, then I'm not sure how to proceed with this setting when it comes to kube-router as a whole. You see, kube-router is most frequently run as a pod that can start and stop at any time. Forwarding state is preserved by the fact that kube-router writes out the routing table to the host's network. However, other than that, kube-router does not preserve state across reboots, so it isn't possible for it to reliably know whether it is starting for the first time, or if it is recovering from a previous run where it had previously established graceful-restart enabled BGP sessions. However, in a typical system, the latter is more common than the former.

According to the documentation that you linked it says (-r being the same as setting LocalRestarting via the Go API):

Without -r option, the peers which are under helper procedure will immediately withdraw all routes which were advertised from gobgpd.

So, my understanding of all of this put together then, is:

Since we have no way of knowing whether we are starting for the first time or restarting, kube-router can only unilaterally set this value to true or false
If we set it to false, then after restarting, and upon re-establishing our BGP session with our peers, they will immediately withdraw all of our previous routes that they were previously holding in their FIB

I would assume that sometime later after route selection is complete, these routes would then come back, but there would be a period of service outage from the routes being withdrawn correct?

rkojedzinszky · 2023-06-29T06:42:14Z

@aauren Thanks, great explanation. We are using this feature of kube-router for inter-node disruption-free upgrades, howewer, for our external connections, we dont want to use gr. This way we expect that if a node goes down for any reason (even just for an upgrade), traffic instantly gets rerouted to another node.

rkojedzinszky · 2023-07-17T14:11:03Z

@fujita can we move forward somehow?

rkojedzinszky · 2023-08-21T08:34:36Z

@fujita ping

rkusler-intermedia · 2023-09-28T21:49:05Z

I'm interested in this one too. Currently we lose our BGP routes every time kubespray restarts the kube-router containers

rkusler-intermedia · 2023-09-28T21:56:11Z

@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?

rkojedzinszky · 2023-09-29T11:12:55Z

@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?

Yes, indeed. If GR is disabled in gobgp, everything works fine, prefixes get advertised as expected to upstream routers, in any scenario.

YutaroHayakawa · 2024-06-10T05:45:08Z

~~I think this PR fixed the issue which is released in v3.26.0.~~

~~In Cilium, we could reproduce with the version using GoBGP v3.23.0, but couldn't with the latest main using v3.26.0. So, once someone else could confirm, I think we can close this issue.~~

Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.

rkojedzinszky · 2024-06-10T06:45:47Z

@YutaroHayakawa I will take a look in the next days, thanks!

YutaroHayakawa · 2024-06-10T12:31:52Z

Scratching my comment above. Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596). Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596) Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596) Fixes: #32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

aauren mentioned this issue Nov 4, 2022

v1.5.1 does not advertise externalIP addresses to external bgp peers cloudnativelabs/kube-router#1389

Closed

rkojedzinszky mentioned this issue May 22, 2023

Kube-router with graceful restart does not advertise external addresses cloudnativelabs/kube-router#1486

Closed

rkusler-intermedia mentioned this issue Sep 28, 2023

Add option to disable kube-router BGP graceful restarts kubernetes-sigs/kubespray#10488

Closed

YakhontovYaroslav mentioned this issue Jun 10, 2024

BGP control plane missing PodCIDR advertisements on some nodes. cilium/cilium#32886

Closed

3 tasks

YutaroHayakawa mentioned this issue Jun 10, 2024

[v1.15] Bump GoBGP to v3.27.0 cilium/cilium#32993

Merged

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024

Upgrade GoBGP to v3.27.0

c859a71

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596). Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024

Upgrade GoBGP to v3.27.0

f780c3f

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596). Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024

Upgrade GoBGP to v3.27.0

8d929a2

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596). Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024

Upgrade GoBGP to v3.27.0

b554426

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596) Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 27, 2024

Upgrade GoBGP to v3.27.0

09785d4

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596) Fixes: cilium#32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

julianwiedmann pushed a commit to cilium/cilium that referenced this issue Jul 2, 2024

Upgrade GoBGP to v3.27.0

335fabb

To pull-in the recent fix (osrg/gobgp#2803) for the issue (osrg/gobgp#2596) Fixes: #32886 Signed-off-by: Yutaro Hayakawa <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graceful-restart: Prefix not advertised to non-gr peer #2596

graceful-restart: Prefix not advertised to non-gr peer #2596

rkojedzinszky commented Oct 31, 2022

rkojedzinszky commented Dec 21, 2022

aauren commented Jun 11, 2023

fujita commented Jun 12, 2023

rkojedzinszky commented Jun 12, 2023

aauren commented Jun 12, 2023

fujita commented Jun 14, 2023 •

edited

Loading

rkojedzinszky commented Jun 14, 2023

rkojedzinszky commented Jun 25, 2023

fujita commented Jun 28, 2023

rkojedzinszky commented Jun 28, 2023

aauren commented Jun 28, 2023

rkojedzinszky commented Jun 28, 2023

fujita commented Jun 28, 2023

aauren commented Jun 29, 2023

rkojedzinszky commented Jun 29, 2023

rkojedzinszky commented Jul 17, 2023

rkojedzinszky commented Aug 21, 2023

rkusler-intermedia commented Sep 28, 2023

rkusler-intermedia commented Sep 28, 2023

rkojedzinszky commented Sep 29, 2023

YutaroHayakawa commented Jun 10, 2024 •

edited

Loading

rkojedzinszky commented Jun 10, 2024

YutaroHayakawa commented Jun 10, 2024

graceful-restart: Prefix not advertised to non-gr peer #2596

graceful-restart: Prefix not advertised to non-gr peer #2596

Comments

rkojedzinszky commented Oct 31, 2022

rkojedzinszky commented Dec 21, 2022

aauren commented Jun 11, 2023

fujita commented Jun 12, 2023

rkojedzinszky commented Jun 12, 2023

aauren commented Jun 12, 2023

fujita commented Jun 14, 2023 • edited Loading

rkojedzinszky commented Jun 14, 2023

rkojedzinszky commented Jun 25, 2023

fujita commented Jun 28, 2023

rkojedzinszky commented Jun 28, 2023

aauren commented Jun 28, 2023

rkojedzinszky commented Jun 28, 2023

fujita commented Jun 28, 2023

aauren commented Jun 29, 2023

rkojedzinszky commented Jun 29, 2023

rkojedzinszky commented Jul 17, 2023

rkojedzinszky commented Aug 21, 2023

rkusler-intermedia commented Sep 28, 2023

rkusler-intermedia commented Sep 28, 2023

rkojedzinszky commented Sep 29, 2023

YutaroHayakawa commented Jun 10, 2024 • edited Loading

rkojedzinszky commented Jun 10, 2024

YutaroHayakawa commented Jun 10, 2024

fujita commented Jun 14, 2023 •

edited

Loading

YutaroHayakawa commented Jun 10, 2024 •

edited

Loading