Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Ipv6 code #428

Merged
merged 10 commits into from
Aug 26, 2020
Merged

fix Ipv6 code #428

merged 10 commits into from
Aug 26, 2020

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Aug 23, 2020

I was curious what the current state of MirageOS and IPv6 is (a question coming up every now and then). Thus I tried with link-local addresses the mirage-skeleton example device-usage/ping6, and discovered:

  • bad IPv6 packets (the length field was never set due to some changes by myself when migrating the API to mirage-net 2.0)
  • the ctx (contaning the neighbour cache etc.) was not stored/assigned when Ndpv6.handle was called
  • the pong packets had bad checksum (again, issue introduced by myself when migrating API)

with these changes, the ping6 example boots and replies to icmpv6 echo requests. This is great progress IMHO (but there's likely some more stuff to fix/enhance before MirageOS is IPv6 ready).

Any reviews are appreciated.

@hannesm
Copy link
Member Author

hannesm commented Aug 23, 2020

CI reports tests are failing. Locally I get exceptions in the connect tests, does this happen to anyone else as well? I'm wondering since earlier CI did not report issues in the test suite (but I seem to remember that alcotest changed its API recently, which may affect this suite?).

@talex5
Copy link
Contributor

talex5 commented Aug 23, 2020

I've enabled ocaml-ci too, and that sees lots of failures on this PR (but master seems mostly OK, except for some 32-bit problems in the tests).

It's a shame Alcotest doesn't provide better output in this case (looks like it just exits without a message).

@hannesm
Copy link
Member Author

hannesm commented Aug 23, 2020

@talex5 thanks for enabling OCaml-CI. I don't understand the failure(s) -- neither do I understand what the purpose of e.g. the connect tests is (and their connection to the changes in this PR, which are only IPv6 related).

@hannesm hannesm changed the title Ipv6 ifx Ipv6 code Aug 24, 2020
@hannesm hannesm changed the title ifx Ipv6 code fix Ipv6 code Aug 24, 2020
@MagnusS
Copy link
Member

MagnusS commented Aug 24, 2020

@hannesm I added those tests a long time ago - I can take a look to see if I can reproduce the errors.

The connect test is one of the first tests we had and it only verifies that the stack can connect and talk to itself (I think it also records the pcap data so it could be extended to verify that the handshake is correct). The iperf test was intended to be a tool to test for throughput regressions, but it has limited use in travis CI where performance will vary anyway -- iirc it has caught some bugs related to longer transfers hanging.

@MagnusS
Copy link
Member

MagnusS commented Aug 24, 2020

Running the tests with ALCOTEST_VERBOSE=1 ALCOTEST_SHOW_ERRORS=1 dune runtest the connect test fails and terminates with:

test.exe: [INFO] NUD: fc00::23 --> PROBE
Fatal error: exception Not_found

This is a bit strange, as the stack used by connect shouldn't enable IPv6. I tried running the connect test alone and it succeeds.

To check if it could be related to the ipv6-test I tried just running ipv6+connect and that still succeeds, however enabling slow tests and running ipv6+keepalive (which is a slower test) throws the Not_found exception:

$ ALCOTEST_VERBOSE=1 ALCOTEST_SHOW_ERRORS=1 _build/default/test/test.exe test 'ipv6|keepalive'
[...]
test.exe: [INFO] NUD: fc00::23 --> PROBE
test.exe: [DEBUG] ND6: Sending NS src=fc00::45 dst=fc00::23 tgt=fc00::23
test.exe: [DEBUG] IP6: Sending packet: dst=fc00::23 mac=02:50:00:00:00:01
Fatal error: exception Not_found

keepalive succeeds when executed alone:

$ _build/default/test/test.exe test 'keepalive`
[...]
  [OK]          keepalive            0   correct number of keepalives.
  [OK]          keepalive            1   we don't try to send old keepalives.
  [OK]          keepalive            2   check we close if we miss all probes.
  [OK]          keepalive            3   check that TCP keepalives detect a network failure.

Could the ipv6 test have enabled something that causes exceptions after the test has ended, e.g. after a timeout?

@hannesm
Copy link
Member Author

hannesm commented Aug 24, 2020

thanks @MagnusS for explaining and investigating.

This is a bit strange, as the stack used by connect shouldn't enable IPv6.

that is my intuition as well.

Could the ipv6 test have enabled something that causes exceptions after the test has ended, e.g. after a timeout?

I don't know, but why should this PR modify the behaviour / lead to exceptions?

The ipv6 test disconnected from the vnetif backend while the stack was
still running, resulting in a `Not_found` exception being returned from
mirage-vnetif when Ethernet.write was called. This only became a problem
in a recent commit that fixed an issue that prevented some of the
background timers from running in the IPv6 stack.

The V.disconnect call seems to have been added to work around another
IPv6 issue where the UDP sender will queue packets with an empty source
address until an IPv6 address is configured. This results in multiple
packets with empty source address being received after the first correct
packet, unless the receiver disconnected from the backend before this
happens.

This updates the test to not send UDP packets until an IP address is
acquired and removes the call V.disconnect in the receiver. It does not
change the queuing behaviour of the IPv6 stack, which may also be
incorrect.
@MagnusS
Copy link
Member

MagnusS commented Aug 24, 2020

@hannesm I pushed an update to your branch to fix the ipv6 test as discussed on slack.

To summarise the problem:

The error seems to be caused by an unchecked mac address lookup that happens if you try to write ethernet frames after disconnecting the netif provided by mirage-vnetif. The ipv6 test disconnects the netif from the backend without shutting down the network stack and then a background timer attempts to write to it a few seconds later, raising a Not_found exception. I assume that the problem now occurs because ctx is stored correctly and the timers will continue to run after the call to V.disconnect.

It looks like the V.disconnect may have been there to prevent the test from failing due to some of the packets being queued and received without a valid address before the IPv6 address is negotiated. Checking that the sender has acquired an IP before sending and then removing the call to V.disconnect resolved the issue.

During testing I also added a counter to the payload and this is what the receiver sees if the sender starts sending without an IP. The queue is released by the sender stack when an IP is available:

test.exe: [DEBUG] IP6: Releasing queued packets: dst=fc00::45 mac=02:50:00:00:00:02
Receiver got UDP from src=fc00::23 dst=fc00::45 payload='hello on UDP over IPv6 39'
Receiver got UDP from src=:: dst=fc00::45 payload='hello on UDP over IPv6 1'
Receiver got UDP from src=:: dst=fc00::45 payload='hello on UDP over IPv6 2'

So there still seems to be some potential problems here (ordering is wrong, src IP empty)- but this at least makes the test pass after an IP has been acquired.

(I also added mirage/mirage-vnetif#32 to track the unhelpful exception in mirage-vnetif)

The MTU restricted iperf test is still failing, I'll try to take a look at that tomorrow.

@MagnusS
Copy link
Member

MagnusS commented Aug 25, 2020

The MTU-limited iperf test was only succeeding if executed after the mtu+tcp test, as the mtu+tcp test would set the MTU to 9000. The iperf test would fail if executed directly as the expected Ethernet frame size was set equal to the MTU, leaving no room for the Ethernet header. I've added a commit to fix both issues (more details in the commit message)

The previous backend contained a ref to the MTU in the module, so the
MTU was remembered between tests. The iperf test would not succeed
unless the mtu+tcp test had run previously, as it set the MTU to 9000
instead of the expected 1500.

This renames the MTU enforced backend to Frame_size_enforced to make it
clear that the size limit is on the Ethernet layer. The frame size is
stored in the returned `t` and is now reset between tests. A helper
function is also added to set the enforced frame size based on the IP
layer MTU (MTU + size of ethernet header).

This also re-enables the iperf test as it now succeeds. The previous
failure was due to the enforced frame size being equal to the MTU,
leaving no room for the Ethernet header.
@MagnusS
Copy link
Member

MagnusS commented Aug 25, 2020

Your changes LGTM @hannesm - with the test updates CI is also green again

test/test_ipv6.ml Outdated Show resolved Hide resolved
test/test_ipv6.ml Outdated Show resolved Hide resolved
Co-authored-by: Hannes Mehnert <[email protected]>
@MagnusS MagnusS merged commit e3659ab into mirage:master Aug 26, 2020
@hannesm hannesm deleted the ipv6 branch August 26, 2020 14:51
hannesm added a commit to hannesm/opam-repository that referenced this pull request Sep 22, 2020
CHANGES:

* Assorted IPv6 improvements (mirage/mirage-tcpip#428 mirage/mirage-tcpip#431 mirage/mirage-tcpip#432 @MagnusS @hannesm)
  - set length in packets to be sent
  - preserve updated ctx from Ndv6.handle
  - fix ICMP checksum computation
  - implement Mirage_stack.V6 signature
  - add connect, mtu, iperf tests
  - fix DAD protocol implementation (and test it)
  - avoid out of bounds accesses of IPv6 packets (check length before accessing)
* Fix 32 bit issues (@MagnusS)
* Implement stack-direct and tcp disconnect: tear down existing connections (mirage/mirage-tcpip#429 @hannesm)
* Treat broadcast address of network as broadcast as well (mirage/mirage-tcpip#430 @hannesm, reported in mirage/mirage-tcpip#427)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants