MTU auto-detection does not work properly #62

RalfJung · 2017-11-01T11:16:04Z

Ever since we upgraded two of our machines from the old (pre-rewrite) tunneldigger to current master, we have MTU issues on those servers. With the old setup, we used to disable the automatic MTU discovery, setting the MUT to a pretty conservative (low) value of 1406 manually.

I have no experience debugging MTU issues, so for now, I am going to re-implement the option to disable MTU discovery. But that's not solving this issue.

In case that matters, here are some more details of our setup: We are adding all the tunnels to a single bridge. We set that bridge's MTU to 1406 after it is created; however, as far as I understand, that value does not have any effect once a device is added to the bridge.

I am aware that to use the MTU properly, we would need one bridge per MTU. However, putting them all into the same bridge will use the minimum MTU, so things should still work, right?

kostko · 2017-11-01T15:13:29Z

Are you sure that the PMTU is discovered incorrectly? Can you test if it works, e.g. by using ICMP ECHO packets of the given sizes? Can you give an example of a correct PMTU and the PMTU detected by tunneldigger?

I am aware that to use the MTU properly, we would need one bridge per MTU. However, putting them all into the same bridge will use the minimum MTU, so things should still work, right?

Yes, we have always used such a configuration (one bridge per MTU).

Putting all of them into the same bridge should use the minimum MTU on the transmit side. But the problem is that you actually have two MTUs (on the transmit and receive sides) and they can be different. In case you use a bridge, any traffic that gets sent into the bridge interface will indeed honor the MTU of the bridge (minimum of all bridge ports). But a bridge may still receive a packet with a higher MTU from a port which supports such a higher MTU, which will (AFAIK) cause the packet to be silently dropped. So for best results, you must ensure that the MTU on both ends of the tunnel is consistent otherwise you could have issues.

RalfJung · 2017-11-01T15:22:50Z

Can you give an example of a correct PMTU and the PMTU detected by tunneldigger?

Unfortunately not. I have no idea what the correct PMTUs are. But I know that applying #63 fixed our server from "unusably slow / no connection" to "working properly".

Can you test if it works, e.g. by using ICMP ECHO packets of the given sizes?

I would appreciate some more detailed instructions. I have pretty much given up on MTU issues to be honest; so far, when we tried to figure things out using ping, there was no correlation between our configuration and the behavior of the system. The fact that ping tests both ways and you can't tell which direction is broken doesn't help either. So, I just don't know how to debug MTU issues, nor how to even figure out what the correct MTU should be. To make things worse, we run Batman inside L2TP, which can do fragmentation (but I don't know if it does), so packets may get through even if they are too large. And then there are Linux bridges, which you can set an MTU on with ip but that seems to be ignored because it uses the minimum connected device -- but what if there is no device connected? We have a bridge on our servers (the "inside" of the batman network) with no device attached to it; ip addr shows an MTU of 1500 for that and AFAIK it is not possible to change that value. I don't know if it is even meaningful though.

if you know of some documentation or similar that would help me actually understand (P)MTUs, that would be much appreciated.

a bridge port may still receive a packet with a higher MTU from a port which supports such a higher MTU, which will (AFAIK) cause the packet to be silently dropped.

Ah, I see. That could make sense. However, there are clients that seem to auto-detect an MTU of even less than 1406 (at least, I just saw some of the tunnels having even smaller MTUs), and these clients do seem to work properly even now that I enforce 1406 to be the MTU no matter what.

EDIT: Actually, maybe not. We use DHCP and RAs to tell the client that the MTU is 1280, and Linux respects that AFAIK. Still, my Linux machine suffered from an unusably slow connection with MTU discovery. So, I don't think this is about client -> server packages being too big. It is the other direction that we have much less control over, and hence much more trouble with.

Maybe it is time to once again try the bridge-free setup, and connect all the tunnels directly to batman. That would at least remove one of the many layers here.

mitar · 2017-11-02T06:41:41Z

But a bridge may still receive a packet with a higher MTU from a port which supports such a higher MTU, which will (AFAIK) cause the packet to be silently dropped. So for best results, you must ensure that the MTU on both ends of the tunnel is consistent otherwise you could have issues.

That was in fact an issue with using Batman, because then you propagate this lowest MTU requirement across the whole network and then the whole network everywhere has to use the lowest MTU.

See also discussion here:

mitar · 2017-11-02T06:42:46Z

See also: https://sudoroom.org/wiki/Mesh/Firmware/MTU_issues

mitar · 2017-11-02T06:46:21Z

Maybe it is time to once again try the bridge-free setup, and connect all the tunnels directly to batman. That would at least remove one of the many layers here.

I would love to see how a proper layer 2 mesh could be done integrating VPN links over different MTUs and then WiFi links. I really worry that inside layer 2, to my knowledge, there is an assumption that everything operates on one MTU. So you have to then limit the whole network to the lowest MTU, or introduce fragmentation. One thing to consider would be that maybe we could somehow get fragmentation to happen in L2TP tunnels in Linux? If so, maybe this could be a solution?

I know that Mikrotik has fragmentation in their L2TP tunnels which makes this much easier.

Of course fragmentation has its own performance hit, but it is definitely better than running the whole network on 576 byes packets. :-)

mitar · 2017-11-02T06:47:53Z

BTW, for IPv6, the minimum required MTU is anyway 1280. So maybe just fixing the whole network to this MTU and this is it. And if you have lower MTU, you are forbidden to connect to the network.

RalfJung · 2017-11-02T08:07:15Z

BTW, for IPv6, the minimum required MTU is anyway 1280. So maybe just fixing the whole network to this MTU and this is it. And if you have lower MTU, you are forbidden to connect to the network.

That is what we are trying to do, essentially. It is what I mean by "giving up on the MTU issue". We have DHCP and RAs set up to tell clients that this is the MTU, so I think for client -> internet traffic, we are good. However, for internet -> client traffic, we are still having problems.

Thanks for your links concerning Batman and MTUs; I will have a closer look over the weekend.

mitar · 2017-11-02T08:09:28Z

We have DHCP and RAs set up to tell clients that this is the MTU

In one of the discussions in those links they are reporting that many clients/systems do not respect DHCP MTU configuration.

mitar · 2017-11-02T08:10:11Z

cc @max-b, @Juul, if they will have anything to add.

Juul · 2017-11-02T09:55:23Z

We have DHCP and RAs set up to tell clients that this is the MTU

Do all common operating systems obey either DHCP or RA MTU announcements? We never actually tested RA but I believe most or all versions Windows completely ignore DHCP MTU announcements.

It seems that all of the issues stem from computers that expect routers to fragment packets for them. This was actually fixed in IPv6 by having IPv6 require that all hosts fragment their own packets (in response to Size Exceeded packets), rather than relying on routers to fragment on their behalf. Not that this is especially helpful for any real world networks unless you want to run an IPv6-only OpenVPN server and convince all your users to set up their devices to connect to that (not a bad idea security-wise and if there is free bandwidth as an incentive then it may actually be feasible).

max-b · 2017-11-03T22:26:39Z

I can confirm that we ran into systems that don't respect DHCP MTU announcements. For more of our thoughts and ultimately the process that led us to abandon the idea of pairing tunneldigger with batman-adv I'd point you to https://sudoroom.org/pipermail/mesh-dev/2014-October/000019.html

Also @papazoga might have something worth adding as well....

RalfJung · 2017-11-04T13:50:17Z

Long-term we'd like to get away from batman for other reasons as well (our network is just getting too large). However, at least from what I saw in other Freifunk communities, no proper alternative actually works well enough yet. Some work started at https://github.com/tcatm/l3roamd to get a dynamic babel setup, but that seems to have stalled.

Do all common operating systems obey either DHCP or RA MTU announcements? We never actually tested RA but I believe most or all versions Windows completely ignore DHCP MTU announcements.

I don't know. It's just the best we can do currently.

papazoga · 2017-11-04T23:14:42Z

DHCP option 26 is not the answer for setting MTU. All major operating systems ignore it (unless something has changed radically in the last couple of years).

Here is how PMTU discovery works roughly. It is described in RFC1191 and it is part of the IP layer (layer 3). When a host first attempts to send an IP packet to another, it sets the PMTU for that destination to a default maximum. If the packet doesn't make it through a router due to MTU being too small there are two options: fragment (almost never enabled, for good reason), or respond with ICMP Destination Unreachable. When the originating host receives this response it will decrement the PMTU, and the price is a dropped packet. All major operating systems do this by default (yes, including Windows).

If you are bridging the access point interface with the L2TP interface in order to achieve batman-style roaming, normal PMTU discovery is broken, and the tunnel's MTU which you went through all that trouble to set properly is irrelevant. When your client sends an oversize packet, it will not get an ICMP Destination Unreachable packet, as the bridge will just drop the frame, and it will not decrement its PMTU.

Now you have a black hole, unless batman is fragmenting your frames for you (which I consider to be a very undesirable situation).

You can patch this up if you start forwarding between the access point and the L2TP interface, as @max-b and I started to do, but now your client is no longer managed by batman, so you've lost roaming, and you might as well be using a real routing protocol.

In fact, that is my recommendation: use a real (layer 3) routing protocol. Roaming is nice, but it's better to be able to deliver packets reliably.

RalfJung · 2017-11-05T10:44:47Z

DHCP option 26 is not the answer for setting MTU. All major operating systems ignore it (unless something has changed radically in the last couple of years).

In my very limited testing, Linux indeed ignores the DHCP-supplied MTU, but it does honors the MTU set via RAs. I did not test other OSes though.

@papazoga thanks for this summary; this has been helpful. I also did some more testing in our network, simulating clients that ignore the MTU we sent be sending pings with an effective IP packet size of 1500. Those packets currently work fine, probably because batman on the nodes does fragmentation. That's not great, but at least it works.
One funny thing I also noticed is that Linux, when fragmenting, seems to make all but the last fragment 4 bytes smaller than they would have to be -- so, with an MTU of 1280, the fragments of 1264 bytes worth of IP data end up having a size 1256 and 8. Is that expected?

On the servers, we have batman fragmentation disabled. Incoming packets get routed here, not switched, so going doing from the uplink MTU to the one of our network should work with normal PMTU discovery -- right?

In fact, that is my recommendation: use a real (layer 3) routing protocol. Roaming is nice, but it's better to be able to deliver packets reliably.

Even ignoring roaming, I am just not aware of anything that would work here. If I understand https://nilsschneider.net/2016/04/10/babel-in-gluon.html correctly, babel itself is not good enough.

Speaking of all this MTU stuff, I still have one open question: If a linux server receives a packet on a device, and that packet is bigger than the MTU set for that device on the Linux side -- what happens? This is clearly an indication that the link behind the device supports larger packets that the MTU says, but well, that happens. I would assume that Linux just shrugs and processes the packet, because why drop a perfectly valid packet just because it is larger than we thought it could be?

papazoga · 2017-11-05T15:59:31Z

On the servers, we have batman fragmentation disabled. Incoming packets get routed here, not switched, so going doing from the uplink MTU to the one of our network should work with normal PMTU discovery -- right?

PMTU discovery is a method for arriving at an MTU for a path. The configuration you've described will not allow PMTU discovery for a path beginning at a client of a wireless access point A, going through the tunnel, and ending at some host B on the internet, assuming A is configured to bridge (or batman-manage) the wireless interface the L2TP interface. The reasons for this are described above.

Keep in mind that MTU problems on the uplink server are relatively easy to solve because you have complete control of that end. The problem is that you have almost no control of your wireless client configurations.

Even ignoring roaming, I am just not aware of anything that would work here. If I understand https://nilsschneider.net/2016/04/10/babel-in-gluon.html correctly, babel itself is not good enough.

Not sure what "good enough" means here.

RalfJung · 2017-11-05T17:16:08Z

The configuration you've described will not allow PMTU discovery for a path beginning at a client of a wireless access point A, going through the tunnel, and ending at some host B on the internet, assuming A is configured to bridge (or batman-manage) the wireless interface the L2TP interface. The reasons for this are described above.

That would be outgoing packets. Batman will fragment them, so while the PMTU is not discovered properly, at least things will work -- right?
Furthermore, looking at the path more closely, at the end of a tunnel sits out server G, which then routes packets through another tunnel, from which they go to the internet and on to B. On G, we set the MTU of that 2nd tunnel to be 1280. So if a client sends a too large packet, batman will fragment it, it arrives at G, where it is detected as being too large during routing and hence the client gets back an ICMP message telling it that the message is too big. I confirmed this to work.
(However, we still have some funny problems -- pings which are between 1280 and roughly 1400 bytes in size don't work. Larger works, and so does smaller.)

Not sure what "good enough" means here.

When a packet arrives at our server G from the internet, the recipient of that packet will be a normal client not speaking babel. So, from what I understand, without further effort, babel can't know how to route that packet. There needs to be something running on the APs, detecting which clients are connected to this AP, and publishing the appropriate routes through babel.

papazoga · 2017-11-05T17:48:47Z

That would be outgoing packets. Batman will fragment them, so while the PMTU is not discovered properly, at least things will work -- right?

I see. Yes; if batman is fragmenting that will probably work. I can't immediately see why you have problems between 1280 and 1400.

There needs to be something running on the APs, detecting which clients are connected to this AP, and publishing the appropriate routes through babel.

This is also correct. On a relatively static network this not a big problem since you can use subnetting (babel can redistribute a /24 per access point, say). A non-proprietary solution for non-static networks is not currently available AFAIK, and is a difficult problem at large scale.

RalfJung · 2017-11-05T17:55:54Z

On a relatively static network this not a big problem since you can use subnetting (babel can redistribute a /24 per access point, say). A non-proprietary solution for non-static networks is not currently available AFAIK, and is a difficult problem at large scale.

That's the catch. We have >500 access points coming and going dynamically, and anybody can set up a new one. Most have <5 clients almost all the time, but some sometimes see a lot more.

papazoga · 2017-11-05T18:23:20Z

We have >500 access points coming and going dynamically, and anybody can set up a new one.

Wow. That sounds like an excellent test-bed for developing a free solution to this problem. :-)

This project, though not a solution, might be of interest. Its scope (Cisco Homenet; see RFC7368) is quite different.

RalfJung · 2017-11-05T19:31:07Z

Wow. That sounds like an excellent test-bed for developing a free solution to this problem. :-)

We are by far not the largest Freifunk network in Germany. ;) That's why l3roamd was started, but it seems to have stalled.

mitar · 2017-11-06T08:04:21Z

Just to add a bit to this great discussion. Remember, there are two MTUs, one in each direction. Tunneldigger implements custom MTU discovery for its package size, by trying different sizes from a predefined set of potential sizes. So, ideally it finds for each connection the best MTU size for each direction. But this process is separate from PMTU discovery process done by OSes. So there are multiple things here.

RalfJung · 2017-11-08T22:07:50Z

I was just told that batman does not support dynamically changing interface MTUs. That would also explain why tunneldigger-style dynamic MTU discovery is causing problems.

TBH, when using Batman within the l2tp tunnel, I cannot even construct a scenario where detecting the right MTU (as opposed to using a fixed conservative lower bound) is advantageous.

RalfJung mentioned this issue Nov 1, 2017

provide an option to set a fixed MTU #63

Merged

RalfJung mentioned this issue Apr 8, 2020

broker: without automatic PMTU, tell client about static PMTU #130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MTU auto-detection does not work properly #62

MTU auto-detection does not work properly #62

RalfJung commented Nov 1, 2017

kostko commented Nov 1, 2017 •

edited

Loading

RalfJung commented Nov 1, 2017 •

edited

Loading

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017 •

edited

Loading

mitar commented Nov 2, 2017

RalfJung commented Nov 2, 2017

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017

Juul commented Nov 2, 2017 •

edited

Loading

max-b commented Nov 3, 2017

RalfJung commented Nov 4, 2017

papazoga commented Nov 4, 2017

RalfJung commented Nov 5, 2017

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017 •

edited

Loading

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017

mitar commented Nov 6, 2017

RalfJung commented Nov 8, 2017

MTU auto-detection does not work properly #62

MTU auto-detection does not work properly #62

Comments

RalfJung commented Nov 1, 2017

kostko commented Nov 1, 2017 • edited Loading

RalfJung commented Nov 1, 2017 • edited Loading

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017 • edited Loading

mitar commented Nov 2, 2017

RalfJung commented Nov 2, 2017

mitar commented Nov 2, 2017

mitar commented Nov 2, 2017

Juul commented Nov 2, 2017 • edited Loading

max-b commented Nov 3, 2017

RalfJung commented Nov 4, 2017

papazoga commented Nov 4, 2017

RalfJung commented Nov 5, 2017

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017 • edited Loading

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017

papazoga commented Nov 5, 2017

RalfJung commented Nov 5, 2017

mitar commented Nov 6, 2017

RalfJung commented Nov 8, 2017

kostko commented Nov 1, 2017 •

edited

Loading

RalfJung commented Nov 1, 2017 •

edited

Loading

mitar commented Nov 2, 2017 •

edited

Loading

Juul commented Nov 2, 2017 •

edited

Loading

RalfJung commented Nov 5, 2017 •

edited

Loading