fix Ipv6 code #428

hannesm · 2020-08-23T18:26:24Z

I was curious what the current state of MirageOS and IPv6 is (a question coming up every now and then). Thus I tried with link-local addresses the mirage-skeleton example device-usage/ping6, and discovered:

bad IPv6 packets (the length field was never set due to some changes by myself when migrating the API to mirage-net 2.0)
the ctx (contaning the neighbour cache etc.) was not stored/assigned when Ndpv6.handle was called
the pong packets had bad checksum (again, issue introduced by myself when migrating API)

with these changes, the ping6 example boots and replies to icmpv6 echo requests. This is great progress IMHO (but there's likely some more stuff to fix/enhance before MirageOS is IPv6 ready).

Any reviews are appreciated.

hannesm · 2020-08-23T19:05:20Z

CI reports tests are failing. Locally I get exceptions in the connect tests, does this happen to anyone else as well? I'm wondering since earlier CI did not report issues in the test suite (but I seem to remember that alcotest changed its API recently, which may affect this suite?).

talex5 · 2020-08-23T19:30:21Z

I've enabled ocaml-ci too, and that sees lots of failures on this PR (but master seems mostly OK, except for some 32-bit problems in the tests).

It's a shame Alcotest doesn't provide better output in this case (looks like it just exits without a message).

hannesm · 2020-08-23T19:51:59Z

@talex5 thanks for enabling OCaml-CI. I don't understand the failure(s) -- neither do I understand what the purpose of e.g. the connect tests is (and their connection to the changes in this PR, which are only IPv6 related).

MagnusS · 2020-08-24T09:48:25Z

@hannesm I added those tests a long time ago - I can take a look to see if I can reproduce the errors.

The connect test is one of the first tests we had and it only verifies that the stack can connect and talk to itself (I think it also records the pcap data so it could be extended to verify that the handshake is correct). The iperf test was intended to be a tool to test for throughput regressions, but it has limited use in travis CI where performance will vary anyway -- iirc it has caught some bugs related to longer transfers hanging.

MagnusS · 2020-08-24T10:35:44Z

Running the tests with ALCOTEST_VERBOSE=1 ALCOTEST_SHOW_ERRORS=1 dune runtest the connect test fails and terminates with:

test.exe: [INFO] NUD: fc00::23 --> PROBE
Fatal error: exception Not_found

This is a bit strange, as the stack used by connect shouldn't enable IPv6. I tried running the connect test alone and it succeeds.

To check if it could be related to the ipv6-test I tried just running ipv6+connect and that still succeeds, however enabling slow tests and running ipv6+keepalive (which is a slower test) throws the Not_found exception:

$ ALCOTEST_VERBOSE=1 ALCOTEST_SHOW_ERRORS=1 _build/default/test/test.exe test 'ipv6|keepalive'
[...]
test.exe: [INFO] NUD: fc00::23 --> PROBE
test.exe: [DEBUG] ND6: Sending NS src=fc00::45 dst=fc00::23 tgt=fc00::23
test.exe: [DEBUG] IP6: Sending packet: dst=fc00::23 mac=02:50:00:00:00:01
Fatal error: exception Not_found

keepalive succeeds when executed alone:

$ _build/default/test/test.exe test 'keepalive`
[...]
  [OK]          keepalive            0   correct number of keepalives.
  [OK]          keepalive            1   we don't try to send old keepalives.
  [OK]          keepalive            2   check we close if we miss all probes.
  [OK]          keepalive            3   check that TCP keepalives detect a network failure.

Could the ipv6 test have enabled something that causes exceptions after the test has ended, e.g. after a timeout?

hannesm · 2020-08-24T11:08:27Z

thanks @MagnusS for explaining and investigating.

This is a bit strange, as the stack used by connect shouldn't enable IPv6.

that is my intuition as well.

Could the ipv6 test have enabled something that causes exceptions after the test has ended, e.g. after a timeout?

I don't know, but why should this PR modify the behaviour / lead to exceptions?

The ipv6 test disconnected from the vnetif backend while the stack was still running, resulting in a `Not_found` exception being returned from mirage-vnetif when Ethernet.write was called. This only became a problem in a recent commit that fixed an issue that prevented some of the background timers from running in the IPv6 stack. The V.disconnect call seems to have been added to work around another IPv6 issue where the UDP sender will queue packets with an empty source address until an IPv6 address is configured. This results in multiple packets with empty source address being received after the first correct packet, unless the receiver disconnected from the backend before this happens. This updates the test to not send UDP packets until an IP address is acquired and removes the call V.disconnect in the receiver. It does not change the queuing behaviour of the IPv6 stack, which may also be incorrect.

MagnusS · 2020-08-24T18:16:14Z

@hannesm I pushed an update to your branch to fix the ipv6 test as discussed on slack.

To summarise the problem:

The error seems to be caused by an unchecked mac address lookup that happens if you try to write ethernet frames after disconnecting the netif provided by mirage-vnetif. The ipv6 test disconnects the netif from the backend without shutting down the network stack and then a background timer attempts to write to it a few seconds later, raising a Not_found exception. I assume that the problem now occurs because ctx is stored correctly and the timers will continue to run after the call to V.disconnect.

It looks like the V.disconnect may have been there to prevent the test from failing due to some of the packets being queued and received without a valid address before the IPv6 address is negotiated. Checking that the sender has acquired an IP before sending and then removing the call to V.disconnect resolved the issue.

During testing I also added a counter to the payload and this is what the receiver sees if the sender starts sending without an IP. The queue is released by the sender stack when an IP is available:

test.exe: [DEBUG] IP6: Releasing queued packets: dst=fc00::45 mac=02:50:00:00:00:02
Receiver got UDP from src=fc00::23 dst=fc00::45 payload='hello on UDP over IPv6 39'
Receiver got UDP from src=:: dst=fc00::45 payload='hello on UDP over IPv6 1'
Receiver got UDP from src=:: dst=fc00::45 payload='hello on UDP over IPv6 2'

So there still seems to be some potential problems here (ordering is wrong, src IP empty)- but this at least makes the test pass after an IP has been acquired.

(I also added mirage/mirage-vnetif#32 to track the unhelpful exception in mirage-vnetif)

The MTU restricted iperf test is still failing, I'll try to take a look at that tomorrow.

MagnusS · 2020-08-25T12:20:46Z

The MTU-limited iperf test was only succeeding if executed after the mtu+tcp test, as the mtu+tcp test would set the MTU to 9000. The iperf test would fail if executed directly as the expected Ethernet frame size was set equal to the MTU, leaving no room for the Ethernet header. I've added a commit to fix both issues (more details in the commit message)

The previous backend contained a ref to the MTU in the module, so the MTU was remembered between tests. The iperf test would not succeed unless the mtu+tcp test had run previously, as it set the MTU to 9000 instead of the expected 1500. This renames the MTU enforced backend to Frame_size_enforced to make it clear that the size limit is on the Ethernet layer. The frame size is stored in the returned `t` and is now reset between tests. A helper function is also added to set the enforced frame size based on the IP layer MTU (MTU + size of ethernet header). This also re-enables the iperf test as it now succeeds. The previous failure was due to the enforced frame size being equal to the MTU, leaving no room for the Ethernet header.

MagnusS · 2020-08-25T13:59:09Z

Your changes LGTM @hannesm - with the test updates CI is also green again

test/test_ipv6.ml

Co-authored-by: Hannes Mehnert <[email protected]>

@MagnusS

CHANGES: * Assorted IPv6 improvements (mirage/mirage-tcpip#428 mirage/mirage-tcpip#431 mirage/mirage-tcpip#432 @MagnusS @hannesm) - set length in packets to be sent - preserve updated ctx from Ndv6.handle - fix ICMP checksum computation - implement Mirage_stack.V6 signature - add connect, mtu, iperf tests - fix DAD protocol implementation (and test it) - avoid out of bounds accesses of IPv6 packets (check length before accessing) * Fix 32 bit issues (@MagnusS) * Implement stack-direct and tcp disconnect: tear down existing connections (mirage/mirage-tcpip#429 @hannesm) * Treat broadcast address of network as broadcast as well (mirage/mirage-tcpip#430 @hannesm, reported in mirage/mirage-tcpip#427)

hannesm added 4 commits August 23, 2020 20:08

Ndpv6: set IPv6 length in packets

da5e6a6

Ipv6: update/preserve ctx returned from ndpv6.handle

92f262d

Ndpv6: set length early

23c8685

Ndpv6: fix ICMP pong checksum computation

9278c44

hannesm added 2 commits August 23, 2020 21:40

test: disable connect and iperf test to check whether this fixes CI

c73d079

Ndpv6: remove assertion regarding sizes

c2d16b2

hannesm changed the title ~~Ipv6~~ ifx Ipv6 code Aug 24, 2020

hannesm changed the title ~~ifx Ipv6 code~~ fix Ipv6 code Aug 24, 2020

MagnusS added 2 commits August 24, 2020 19:43

Reenable the ipv6 test, turn on exception backtraces

de10113

MagnusS mentioned this pull request Aug 24, 2020

Mac address lookups may throw Not_found mirage/mirage-vnetif#32

Open

MagnusS force-pushed the ipv6 branch from 6e267e5 to 3b91e2e Compare August 25, 2020 13:44

hannesm commented Aug 26, 2020

View reviewed changes

test/test_ipv6.ml Outdated Show resolved Hide resolved

hannesm commented Aug 26, 2020

View reviewed changes

test/test_ipv6.ml Outdated Show resolved Hide resolved

Remove unnecessary parentheses

c9e1a04

Co-authored-by: Hannes Mehnert <[email protected]>

MagnusS merged commit e3659ab into mirage:master Aug 26, 2020

hannesm deleted the ipv6 branch August 26, 2020 14:51

hannesm mentioned this pull request Sep 22, 2020

[new release] tcpip (5.0.1) ocaml/opam-repository#17249

Merged

MagnusS mentioned this pull request Nov 21, 2020

Transient failures in the tests #435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Ipv6 code #428

fix Ipv6 code #428

hannesm commented Aug 23, 2020

hannesm commented Aug 23, 2020

talex5 commented Aug 23, 2020

hannesm commented Aug 23, 2020

MagnusS commented Aug 24, 2020

MagnusS commented Aug 24, 2020

hannesm commented Aug 24, 2020

MagnusS commented Aug 24, 2020

MagnusS commented Aug 25, 2020

MagnusS commented Aug 25, 2020

fix Ipv6 code #428

fix Ipv6 code #428

Conversation

hannesm commented Aug 23, 2020

hannesm commented Aug 23, 2020

talex5 commented Aug 23, 2020

hannesm commented Aug 23, 2020

MagnusS commented Aug 24, 2020

MagnusS commented Aug 24, 2020

hannesm commented Aug 24, 2020

MagnusS commented Aug 24, 2020

MagnusS commented Aug 25, 2020

MagnusS commented Aug 25, 2020