-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ECMP for unsafe_routes #1332
base: master
Are you sure you want to change the base?
Conversation
Thanks for the contribution! Before we can merge this, we need @dioss-Machiel to sign the Salesforce Inc. Contributor License Agreement. |
|
Hi @dioss-Machiel, this looks very interesting. One note:
It does look like this PR may offer effective load balancing between gateways, but I'm not sure if it provides redundancy. If a given gateway is down, will this code detect it, and route on the other gateway instead? |
This implementation does indeed provide redundancy in the following network setup: How this technically works: For example: if you have three gateways with weight "1" then each of them handles 1/3 of the traffic. If you only have two gateways then each of them handles 1/2 of the traffic, if one goes down the other gateway will handle all the traffic. This is the link to the relevant code in this PR: (inside.go, line 187) A more advanced implementation could recalculate the buckets when nodes go up or down to keep balancing the traffic on the remaining gateways but that comes with more complexity and maybe a slight penalty to routing speed. |
@dioss-Machiel we merged a pretty large change to support ipv6 on the overlay, mind rebasing to clear up the conflicts? |
Still always use the first route found, this should not change any routing behaviour in nebula.
Prefer first route found, if gatway unavailable then keep trying untill all options are exhausted.
WIP Multipath is working but routing table updates are still broken
8e46695
to
d987d75
Compare
@nbrownus Conflicts have been resolved |
d987d75
to
0cc2f98
Compare
I fixed the testify errors |
…teways are reachable
Did a quick benchmark,
This benchmark includes 2 other xxhash implementations but they have slightly worse distribution while being much faster than xxh3. I included As a reference point, the firewall takes ~40ns/op on my machine to pass a packet on a group or name match. Worst case performance in the firewall is ~95ns/op to evaluate and fail a full rule evaluation. If we went with
Which doesn't seem too terrible to me. We can take this a bit further as well by removing the local and remote addr, If we can drop down to just evaluating the local and remote ports then we can use a much faster hash described here (taken from #768) which has really nice distribution (does not require modifying your tests) and incredible speed, 0.3117 ns/op on my machine.
|
overlay/route.go
Outdated
} | ||
|
||
if gatewayWeight < 1 || gatewayWeight > math.MaxInt32 { | ||
return nil, fmt.Errorf("entry .weight in tun.unsafe_routes[%v].via[%v] is not in range (1-%d) : %v", i+1, ig+1, math.MaxInt32, metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric
should be gatewayWeight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!, Fixed now
t.l.WithField("route", r).Debug("Ignoring route update, not in our network") | ||
} else { | ||
// p.Hops+1 = weight of the route | ||
gateways = append(gateways, routing.NewGateway(gwAddr, p.Hops+1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing sudo ip route add 3.3.3.4/32 nexthop via 192.168.5.13 weight 5 nexthop via 192.168.5.14 weight 10
I do see p.Weight
set correctly on my machine, is there a reason to use r.Hops+1
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be overlooking something but I don't see a Weight field on NexthopInfo in netlink v1.3.0, there is only a "Weight" in the String() method for NexthopInfo
https://github.com/vishvananda/netlink/blob/6f5713947556a0288c5cb71f036f9e91924ebcaa/route.go#L169
type NexthopInfo struct {
LinkIndex int
Hops int
Gw net.IP
Flags int
NewDst Destination
Encap Encap
Via Destination
}
func (n *NexthopInfo) String() string {
elems := []string{}
elems = append(elems, fmt.Sprintf("Ifindex: %d", n.LinkIndex))
if n.NewDst != nil {
elems = append(elems, fmt.Sprintf("NewDst: %s", n.NewDst))
}
if n.Encap != nil {
elems = append(elems, fmt.Sprintf("Encap: %s", n.Encap))
}
if n.Via != nil {
elems = append(elems, fmt.Sprintf("Via: %s", n.Via))
}
elems = append(elems, fmt.Sprintf("Weight: %d", n.Hops+1))
elems = append(elems, fmt.Sprintf("Gw: %s", n.Gw))
elems = append(elems, fmt.Sprintf("Flags: %s", n.ListFlags()))
return fmt.Sprintf("{%s}", strings.Join(elems, " "))
}
@@ -589,12 +636,12 @@ func (t *tun) updateRoutes(r netlink.RouteUpdate) { | |||
newTree := t.routeTree.Load().Clone() | |||
|
|||
if r.Type == unix.RTM_NEWROUTE { | |||
t.l.WithField("destination", r.Dst).WithField("via", r.Gw).Info("Adding route") | |||
newTree.Insert(dst, gwAddr) | |||
t.l.WithField("destination", dst).WithField("via", gateways).Info("Adding route") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to transform gateways
to a human readable string here and below.
Currently this is logging:
via="[{{{0 281473913980173} {0xc00014a150}} 5 715827882} {{{0 281473913980174} {0xc00014a150}} 10 2147483647}]"
I haven't audited all log sites for []routing.Gateway
yet but it would probably be best to deal with a concrete type Gateways
than the slice to avoid future foot guns when trying to log this structure.
Doing the easy thing and casting for logging is a bit easier but the end implementation looks mostly the same
type viaGateways []routing.Gateway
func (g viaGateways) String() string {
str := ""
for i, gw := range g {
str += gw.String()
if i < len(g)-1 {
str += ", "
}
}
return str
}
...
t.l.WithField("destination", dst).WithField("via", viaGateways(gateways)).Info("Adding route")
This gives a slightly better log line:
via="192.168.5.13:5/715827882, 192.168.5.14:10/2147483647"
To which I think we should adjust the Gateway.String()
to look something along the lines of
func (g *Gateway) String() string {
return fmt.Sprintf("%s weight %d", g.addr, g.weight)
}
I would like to avoid changing via
from a string to an []
when logging in json to avoid the pitfalls of changing the field type for systems like elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the idea, does this look like a reasonable log line to you, or does it look to much like json?
level=info msg="Adding route" destination=10.152.5.22/32 via="{addr: 10.159.0.1, weight: 1}, {addr: 10.159.0.2, weight: 1}"
I'm not married to the hashing algorithm, I did some quick research and other L4 balancing implementations also only hash by source and destination port. Using the hash-prospector implementation looks good to me. |
This algorithm has better performance and we can remove some dependencies
This PR implements ECMP support in Nebula. The implementation uses hash-threshold mapping (like in the Linux kernel) which allows you to define weights per gateway.
This can be used for example to aggregate multiple links and provide redundancy to an external location where nebula cannot be installed.
The change is backwards compatible and should not impact normal operation.
ECMP routes can be defined in the config file (example below) and also via the use_system_route_table feature, which means that on Linux you can use a BGP daemon to add / remove multipath routes.
Example config:
Implementation note: if the gateway where the packet should be routed to is not reachable the first available gateway is chosen, so when one of the gateways is down the traffic is no longer properly being balanced, but there is still connectivity.