Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graceful-restart: Prefix not advertised to non-gr peer #2596

Open
rkojedzinszky opened this issue Oct 31, 2022 · 23 comments
Open

graceful-restart: Prefix not advertised to non-gr peer #2596

rkojedzinszky opened this issue Oct 31, 2022 · 23 comments

Comments

@rkojedzinszky
Copy link

History: cloudnativelabs/kube-router#1389

I have a gobgp instance configured for graceful restart, and an external peer without graceful restart capability. The symptom is that when gobgp is started, the bgp session gets established, howewer, no prefixes get advertised to the neighbor. If then I restart the peer, prefixes get advertised.

I wrote a sample source which demonstrates this.

gobgp.zip

Note that graceful restart is configured for ipv4 only. Then, when bgp is established, I get the following log entries:

DEBU[0000] Now syncing, suppress sending updates. start deferral timer  Duration=10 Key=192.168.111.21 Topic=Server
DEBU[0001] received update                               Key=192.168.111.21 Topic=Peer attributes="[]" nlri="[]" withdrawals="[]"
DEBU[0001] EOR received                                  AddressFamily=ipv4-unicast Key=192.168.111.21 Topic=Peer
INFO[0001] sync finished                                 Topic=Server

Then, all address families receive EOR (https://github.com/osrg/gobgp/blob/master/pkg/server/server.go#L1699). But when proceeding, it turns out that p.isGracefulRestartEnabled() returns false, thus not sending out updates.

Kube-router until v1.5.1 configured graceful restart for both ipv4 and ipv6. That made the deferral-timeout trigger, and then a different code path had sent out advertisements, howewer only after the timeout.

Once the peer is restarted, then the whole FSM knows about the peer is not configured for graceful restart, and it gets advertisements as soon the session is up.

For the peer a simple frr configuration is enough:

router bgp 10
 no bgp ebgp-requires-policy
 bgp graceful-restart-disable
 neighbor 192.168.111.1 remote-as 20
@rkojedzinszky
Copy link
Author

ping

@aauren
Copy link

aauren commented Jun 11, 2023

@fujita - Any chance you could take a look at this one?

@rkojedzinszky has done quite a bit of work tracking down this seeming inconsistency with GoBGP's handling of GracefulRestart when GoBGP is configured for GracefulRestart, but the peer is not (see cloudnativelabs/kube-router#1389 for the full history and context). This is causing them quite a bit of a headache as anytime kube-router is restarted they lose all routes until they also restart their peer.

@fujita
Copy link
Member

fujita commented Jun 12, 2023

This is a regression(old version gobgp worked)?

@rkojedzinszky
Copy link
Author

This is a regression(old version gobgp worked)?

I dont know, as I was using kube-router only. I suspect a change in kube-router triggered this bug. Earlier, a bug in kube-router masked or changed behavior in a way that it seemed that everything is ok, but it turns out that the applied change in kube-router is correct (cloudnativelabs/kube-router#1327).

@aauren
Copy link

aauren commented Jun 12, 2023

Elaborating just a bit in this thread (obviously all work can be found in the PR that @rkojedzinszky linked)...

Previously kube-router used to announce graceful restart for the gobgpapi.Family_AFI_IP and gobgpapi.Family_AFI_IP6 AFI families whether we used them or not. As a matter of fact, at the point in time that @rkojedzinszky linked to, the BGP implementation in kube-router was only capable of working with IPv4 or IPv6 and could not work with both at the same time.

Some BGP peer implementations were fine ignoring an AFI setup for a family that wasn't used, but other ones it caused breakages unless users configured an AFI group that they weren't using and usually weren't capable of using.

However, this also seems to have worked around a bug (?) in GoBGP that @rkojedzinszky documented in the description of this issue up above.

@fujita
Copy link
Member

fujita commented Jun 14, 2023

Hmm, seems that I can't reproduce this.

btw, in the sample code, why LocalRestarting is set to true? It should not.

@rkojedzinszky
Copy link
Author

@fujita I have created my sample code based on kube-router behavior. I will need some time to pick this up again, I'll report as soon as I can.

@rkojedzinszky
Copy link
Author

Hmm, seems that I can't reproduce this.

btw, in the sample code, why LocalRestarting is set to true? It should not.

@fujita

I could reproduce it again. So use the attached code, and make sure to use the following frr configartion:

# sh running-config 
Building configuration...

Current configuration:
!
frr version 8.4.2
frr defaults traditional
hostname frr
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 10
 bgp graceful-restart-disable
 neighbor 192.168.111.1 remote-as 20
 !
 address-family ipv4 unicast
  neighbor 192.168.111.1 route-map bgp-default in
  neighbor 192.168.111.1 route-map bgp-default out
 exit-address-family
exit
!
route-map bgp-default permit 10
exit
!
end

Pay attention to the bgp graceful-restart-disable statement. Ensure that the sample code is started after frr. Then I observed that after 3 mins there are no advertisements:

frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 8
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        42        41        0    0    0 00:03:31            0        0 N/A

Total number of neighbors 1

Howewer, if I make a clear ip bgp on frr, then the sessions gets established again, and then prefixes gets advertised immediately:

frr# clear ip bgp *
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        49        0    0    0 00:00:02         Idle        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        50        0    0    0 00:00:05       Active        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 10
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        46        50        0    0    0 00:00:12       Active        0 N/A

Total number of neighbors 1
frr# sh ip bgp summary 

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.111.55, local AS number 10 vrf-id 0
BGP table version 11
RIB entries 1, using 192 bytes of memory
Peers 1, using 724 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
192.168.111.1   4         20        49        52        0    0    0 00:00:01            1        0 N/A

Total number of neighbors 1

This behavior is consistent with cisco ios implemetation too, we had never had issues with frr-cisco setups with gr enabled on one end and disabled on the other.

@fujita
Copy link
Member

fujita commented Jun 28, 2023

I can reproduce the issue. The following change fixed the issue.

diff --git a/main.go b/main.go
index 87bb682..eea63c7 100644
--- a/main.go
+++ b/main.go
@@ -91,7 +91,7 @@ func main() {
                                Enabled:         true,
                                RestartTime:     120,
                                DeferralTime:    10,
-                               LocalRestarting: true,
+                               // LocalRestarting: true,
                        },
                        AfiSafis: []*gobgpapi.AfiSafi{

@rkojedzinszky
Copy link
Author

@aauren

I created my own example code based on kube-router, where LocalRestarting is also set to true. I think there is a good reason why it is done that way, howewer, I am not really familiar with such depths of bgp/gobgp fsm. Can you please comment on it?

@aauren
Copy link

aauren commented Jun 28, 2023

@rkojedzinszky This was unfortunately way before my time with kube-router. It was introduced with the graceful-restart feature here: https://github.com/cloudnativelabs/kube-router/pull/220/files#diff-d0dae64b5424393b01d606bfcabfa3e8fd82d2c466eaeb28187a4da833105273R504 and no description was given as to why it was added.

@fujita Thanks for tracking this down! We really appreciate your time and effort! I found a bit of code documentation about this feature here: https://github.com/osrg/gobgp/blob/master/internal/pkg/config/bgp_configs.go#L4246-L4252

However, I'm still not certain that I understand the implications of setting or not setting it and without understanding it better I feel hesitant to change a setting that has been in the graceful restart implementation since day 1 with kube-router.

Is there any chance that you would be willing to clarify what the option does and what the impact to it being set to true or false is?

@rkojedzinszky
Copy link
Author

@fujita Also please note that in the sample code if you enable IPv6 address family too, and follow the same sequence of starting the speakers, then as commented in the code, a timer will fire soon, and advertisements will begin to be sent. This is with LocalRestarting: true. So, it seems to be an inconsistent behavior.

@fujita
Copy link
Member

fujita commented Jun 28, 2023

LocalRestarting is configuration for a restarting speaker:
https://github.com/osrg/gobgp/blob/master/docs/sources/graceful-restart.md#restarting-speaker

should not be set in your case.

I can't explain the behavior with IPv6 enabled. Needs to investigate.

@aauren
Copy link

aauren commented Jun 29, 2023

@fujita Thanks again for continuing to pursue this with us.

You mentioned that you didn't think that LocalRestarting should be set in our case. And in the case of @rkojedzinszky's example code, I think I understand what you mean because it is starting for the first time, and yet it is still setting the LocalRestarting value. However, I'm not sure if it would actually be good to disable it for the kube-router case.

First off, to ensure that I understand what this flag does correctly from your documentation, is it correct to say that it lets it's peers know that it is recovering from a restart by setting the restart bit. And that this is the same procedure that is outlined by RFC4724 here:

To re-establish the session with its peer, the Restarting Speaker
MUST set the "Restart State" bit in the Graceful Restart Capability
of the OPEN message. Unless allowed via configuration, the
"Forwarding State" bit for an address family in the capability can be
set only if the forwarding state has indeed been preserved for that
address family during the restart.

If so, then I'm not sure how to proceed with this setting when it comes to kube-router as a whole. You see, kube-router is most frequently run as a pod that can start and stop at any time. Forwarding state is preserved by the fact that kube-router writes out the routing table to the host's network. However, other than that, kube-router does not preserve state across reboots, so it isn't possible for it to reliably know whether it is starting for the first time, or if it is recovering from a previous run where it had previously established graceful-restart enabled BGP sessions. However, in a typical system, the latter is more common than the former.

According to the documentation that you linked it says (-r being the same as setting LocalRestarting via the Go API):

Without -r option, the peers which are under helper procedure will immediately withdraw all routes which were advertised from gobgpd.

So, my understanding of all of this put together then, is:

  • Since we have no way of knowing whether we are starting for the first time or restarting, kube-router can only unilaterally set this value to true or false
  • If we set it to false, then after restarting, and upon re-establishing our BGP session with our peers, they will immediately withdraw all of our previous routes that they were previously holding in their FIB

I would assume that sometime later after route selection is complete, these routes would then come back, but there would be a period of service outage from the routes being withdrawn correct?

@rkojedzinszky
Copy link
Author

@aauren Thanks, great explanation. We are using this feature of kube-router for inter-node disruption-free upgrades, howewer, for our external connections, we dont want to use gr. This way we expect that if a node goes down for any reason (even just for an upgrade), traffic instantly gets rerouted to another node.

@rkojedzinszky
Copy link
Author

@fujita can we move forward somehow?

@rkojedzinszky
Copy link
Author

@fujita ping

@rkusler-intermedia
Copy link

I'm interested in this one too. Currently we lose our BGP routes every time kubespray restarts the kube-router containers

@rkusler-intermedia
Copy link

@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?

@rkojedzinszky
Copy link
Author

@rkojedzinszky I understand that you want graceful restarts for BGP sessions between nodes and want it disabled for BGP sessions to your upstream router. In theory, IF you did not need graceful restarts for inter-node BGP sessions, would removing the --bgp-graceful-restart resolve this issue that you're seeing on sessions to the upstream routers?

Yes, indeed. If GR is disabled in gobgp, everything works fine, prefixes get advertised as expected to upstream routers, in any scenario.

@YutaroHayakawa
Copy link
Contributor

YutaroHayakawa commented Jun 10, 2024

I think this PR fixed the issue which is released in v3.26.0.

In Cilium, we could reproduce with the version using GoBGP v3.23.0, but couldn't with the latest main using v3.26.0. So, once someone else could confirm, I think we can close this issue.

Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.

@rkojedzinszky
Copy link
Author

@YutaroHayakawa I will take a look in the next days, thanks!

@YutaroHayakawa
Copy link
Contributor

Scratching my comment above. Seems like the way I tried was bad. Even with v3.26.0, the issue was still there.

YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596).

Fixes: cilium#32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596).

Fixes: cilium#32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596).

Fixes: cilium#32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 24, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596)

Fixes: cilium#32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
YutaroHayakawa added a commit to YutaroHayakawa/cilium that referenced this issue Jun 27, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596)

Fixes: cilium#32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
julianwiedmann pushed a commit to cilium/cilium that referenced this issue Jul 2, 2024
To pull-in the recent fix (osrg/gobgp#2803) for
the issue (osrg/gobgp#2596)

Fixes: #32886

Signed-off-by: Yutaro Hayakawa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants