Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open SCTP connections from FreeBSD to Linux #432

Open
fredhau opened this issue Jan 31, 2018 · 8 comments
Open

Cannot open SCTP connections from FreeBSD to Linux #432

fredhau opened this issue Jan 31, 2018 · 8 comments

Comments

@fredhau
Copy link
Collaborator

fredhau commented Jan 31, 2018

When running tneat as a server on Linux as follows:

$ ./tneat -P prop_sctp.json

and connecting to this server with tneat on FreeBSD as follows:

$ ./tneat -P prop_sctp.json <IP address>

on_connected is never called on the client side on FreeBSD. However, this works when both machines are running Linux. This also works when e.g. TCP is used, so the problem seems to be related to how NEAT handles SCTP and how it is handled differently on Linux and FreeBSD.

The Linux machine ran Ubuntu 16.04.1 LTS 4.13.0-32-generic. The FreeBSD machine ran FreeBSD 11.1-RELEASE r321309.

@fredhau
Copy link
Collaborator Author

fredhau commented Jan 31, 2018

The problem also occurs when trying to connect from FreeBSD to FreeBSD.

@fredhau
Copy link
Collaborator Author

fredhau commented Feb 1, 2018

I ran socat at both sides to check whether SCTP works, but it doesn't. It seems like the problem is with the SCTP implementation and not with NEAT.

socat client running in FreeBSD:

2018/02/01 00:30:45 socat[901] I socat by Gerhard Rieger and contributors - see www.dest-unreach.org
2018/02/01 00:30:45 socat[901] I This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)
2018/02/01 00:30:45 socat[901] I This product includes software written by Tim Hudson ([email protected])
2018/02/01 00:30:45 socat[901] N reading from and writing to stdio
2018/02/01 00:30:45 socat[901] N opening connection to LEN=16 AF=2 192.168.10.10:12327
2018/02/01 00:30:45 socat[901] I starting connect loop
2018/02/01 00:30:45 socat[901] I socket(2, 1, 132) -> 5
2018/02/01 00:30:45 socat[901] E connect(5, LEN=16 AF=2 192.168.10.10:12327, 16): Connection reset by peer
2018/02/01 00:30:45 socat[901] N exit(1)
2018/02/01 00:30:45 socat[901] I shutdown(5, 2)
2018/02/01 00:30:45 socat[901] I shutdown(5, 2): Socket is not connected

socat server running in Linux:

2018/02/01 01:30:40 socat[2676] I socat by Gerhard Rieger - see www.dest-unreach.org
2018/02/01 01:30:40 socat[2676] I This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)
2018/02/01 01:30:40 socat[2676] I This product includes software written by Tim Hudson ([email protected])
2018/02/01 01:30:40 socat[2676] N reading from and writing to stdio
2018/02/01 01:30:40 socat[2676] I socket(2, 1, 132) -> 5
2018/02/01 01:30:40 socat[2676] I starting accept loop
2018/02/01 01:30:40 socat[2676] N listening on AF=2 0.0.0.0:12327
2018/02/01 01:30:45 socat[2676] I accept(5, {2, AF=2 192.168.20.40:28199}, 16) -> 6
2018/02/01 01:30:45 socat[2676] N accepting connection from AF=2 192.168.20.40:28199 on AF=2 192.168.10.10:12327
2018/02/01 01:30:45 socat[2676] I permitting connection from AF=2 192.168.20.40:28199
2018/02/01 01:30:45 socat[2676] I close(5)
2018/02/01 01:30:45 socat[2676] I resolved and opened all sock addresses
2018/02/01 01:30:45 socat[2676] N starting data transfer loop with FDs [0,1] and [6,6]
2018/02/01 01:30:49 socat[2676] W read(6, 0x23cc710, 8192): Connection reset by peer
2018/02/01 01:30:49 socat[2676] N socket 2 to socket 1 is in error
2018/02/01 01:30:49 socat[2676] N socket 2 (fd 6) is at EOF
2018/02/01 01:30:50 socat[2676] I poll timed out (no data within 0.500000 seconds)
2018/02/01 01:30:50 socat[2676] I shutdown(6, 2)
2018/02/01 01:30:50 socat[2676] N exiting with status 0

@tuexen
Copy link
Contributor

tuexen commented Feb 1, 2018

Can you capture the traffic on the FreeBSD box on all interfaces and provide the capture file? The connect() call returns with an error indicating "Connection reset by peer". So I guess we'll see an ABORT chunk. Not sure which box is sending it and why. Most likely suspect: a middlebox...

@weinrank
Copy link
Member

weinrank commented Feb 1, 2018

Are those real machines or VMs? If VMs: which adapter type do you use?

@fredhau
Copy link
Collaborator Author

fredhau commented Feb 1, 2018

The machines are VMs. Both machines has a NAT adapter and an Internal Network adapter.

Data captured on the FreeBSD internal network interface (em1):

23:12:42.430110 IP6 fe80::a00:27ff:fe3b:b9c9 > ip6-allrouters: ICMP6, router solicitation, length 16
23:12:57.275057 IP 192.168.20.40.38101 > 192.168.10.10.12327: sctp (1) [INIT] [init tag: 2225606087] [rwnd: 1864135] [OS: 10] [MIS: 2048] [init TSN: 3768621358]
23:12:57.376455 IP 192.168.10.10.12327 > 192.168.20.40.38101: sctp (1) [INIT ACK] [init tag: 41739531] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 4268204033]
23:12:57.376516 IP 192.168.20.40.38101 > 192.168.10.10.12327: sctp (1) [COOKIE ECHO]
23:12:57.479004 IP 192.168.10.10.12327 > 192.168.20.40.38101: sctp (1) [COOKIE ACK]
23:13:02.398287 ARP, Request who-has 192.168.20.40 tell 192.168.20.20, length 46
23:13:02.398327 ARP, Reply 192.168.20.40 is-at 08:00:27:53:f7:fb (oui Unknown), length 28

Data captured on the FreeBSD loopback interface (lo0):

23:12:57.479029 IP 192.168.20.40.38101 > 10.0.2.15.12327: sctp (1) [HB REQ]
23:12:57.479071 IP 10.0.2.15.12327 > 192.168.20.40.38101: sctp (1) [ABORT]

The ABORT is captured on the loopback interface sent from the NAT service. When disabling the NAT adapter, SCTP works properly. I don't understand why the connection should fail even though the 4-way handshake has completed between the endpoints though... Or how to get SCTP to work when the NAT interface is enabled. I tried to specify only the internal network IP with the local_ips property, but still the same problem.

@tuexen
Copy link
Contributor

tuexen commented Feb 2, 2018

If you specify the local IP addresses on both sides, it should work. There should be not address parameters in the INIT or INIT-ACK chunk. We can chat in Oslo about why it doesn't work with NAT boxes being involved...

@fredhau
Copy link
Collaborator Author

fredhau commented Feb 4, 2018

Ah, that makes sense. I noticed that there are address parameters in the INIT and INIT-ACK chunks, so I attempted to connect the socat client to a NEAT server with the NEAT_TAG_LOCAL_NAME optional argument set to the local IP address and it works! However, when I attempt to connect a NEAT client to the same server, the SCTP association is established, but on_connected is not called for the client.

Packet capture: em1.txt

21:18:01.335896 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [INIT] [init tag: 3472062275] [rwnd: 1864135] [OS: 123] [MIS: 123] [init TSN: 3735537264]
21:18:01.437294 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [INIT ACK] [init tag: 2814136528] [rwnd: 106496] [OS: 123] [MIS: 123] [init TSN: 2458143342]
21:18:01.437326 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [COOKIE ECHO]
21:18:01.538415 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [COOKIE ACK]
21:18:06.561159 ARP, Request who-has 192.168.20.40 tell 192.168.20.20, length 46
21:18:06.561181 ARP, Reply 192.168.20.40 is-at 08:00:27:53:f7:fb (oui Unknown), length 28
21:18:32.539597 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [HB REQ]
21:18:32.640349 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [HB ACK]
21:18:36.671618 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [HB REQ]
21:18:36.671649 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [HB ACK]
21:18:41.631740 ARP, Request who-has 192.168.20.40 tell 192.168.20.20, length 46
21:18:41.631756 ARP, Reply 192.168.20.40 is-at 08:00:27:53:f7:fb (oui Unknown), length 28
21:19:03.205763 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [HB REQ]
21:19:03.307677 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [HB ACK]
21:19:08.947087 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [SHUTDOWN]
21:19:09.049638 IP 192.168.10.10.12327 > 192.168.20.40.39895: sctp (1) [SHUTDOWN ACK]
21:19:09.049667 IP 192.168.20.40.39895 > 192.168.10.10.12327: sctp (1) [SHUTDOWN COMPLETE]

NEAT client output:

Connecting to: 192.168.10.10:12327
Receive buffer size: 10240 bytes
Rate of sending requests: 0
Number of requests to send: 1
NEAT log level: 4
User log level: 2
Statistics log rate: 0
Multihoming: disabled
Send HTTP requests for data objects of size: 10 bytes
Use the following properties:
{ "transport": { "value": ["SCTP"], "precedence": 1 }, "local_ips": [ { "value": "192.168.20.40", "precedence": 1 } ] }

Initializing NEAT context...
[ 0.000000][INF] nt_log_init - opening logfile ...
[ 0.000000][DBG] neat_init_ctx
[ 0.000103][INF] Available src-addresses:
[ 0.000111][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000116][DBG] nt_run_event_cb
[ 0.000122][INF] Available src-addresses:
[ 0.000127][INF] IPv4: 10.0.2.15/0
[ 0.000132][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000137][DBG] nt_run_event_cb
[ 0.000142][INF] Available src-addresses:
[ 0.000146][INF] IPv4: 192.168.20.40/0
[ 0.000151][INF] IPv4: 10.0.2.15/0
[ 0.000156][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000160][DBG] nt_run_event_cb
[ 0.000166][INF] Available src-addresses:
[ 0.000170][INF] IPv6: ::1/0 pref 4294967295 valid 4294967295
[ 0.000175][INF] IPv4: 192.168.20.40/0
[ 0.000180][INF] IPv4: 10.0.2.15/0
[ 0.000185][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000189][DBG] nt_run_event_cb
[ 0.000195][INF] Available src-addresses:
[ 0.000200][INF] IPv6: fe80::1/0 pref 4294967295 valid 4294967295
[ 0.000204][INF] IPv6: ::1/0 pref 4294967295 valid 4294967295
[ 0.000209][INF] IPv4: 192.168.20.40/0
[ 0.000214][INF] IPv4: 10.0.2.15/0
[ 0.000219][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000223][DBG] nt_run_event_cb
[ 0.000228][INF] Available src-addresses:
[ 0.000232][INF] IPv4: 127.0.0.1/0
[ 0.000237][INF] IPv6: fe80::1/0 pref 4294967295 valid 4294967295
[ 0.000242][INF] IPv6: ::1/0 pref 4294967295 valid 4294967295
[ 0.000246][INF] IPv4: 192.168.20.40/0
[ 0.000251][INF] IPv4: 10.0.2.15/0
[ 0.000256][INF] IPv6: fe80::a00:27ff:fe63:5b51/0 pref 4294967295 valid 4294967295
[ 0.000260][DBG] nt_run_event_cb
Creating new flow (1)...
[ 0.000268][DBG] neat_new_flow
[ 0.000323][INF] neat_new_flow - new flow created: 0x802659000
[ 0.000331][DBG] neat_set_property
[ 0.000335][DBG] neat_set_property - { "transport": { "value": ["SCTP"], "precedence": 1 }, "local_ips": [ { "value": "192.168.20.40", "precedence": 1 } ] }
[ 0.000359][DBG] neat_set_operations
[ 0.000364][DBG] nt_update_poll_handle
[ 0.000369][DBG] nt_update_poll_handle - loop is NULL or handle is closing - skipping
Opening connection to 192.168.10.10 (1)...
[ 0.000375][DBG] neat_open
[ 0.000382][DBG] nt_add_event_cb
[ 0.000386][INF] nt_add_event_cb - Added new callback for event type 0
[ 0.000391][DBG] nt_add_event_cb
[ 0.000395][INF] nt_add_event_cb - Added new callback for event type 2
[ 0.000430][DBG] nt_add_event_cb
[ 0.000435][INF] nt_add_event_cb - Added new callback for event type 0
[ 0.000439][DBG] send_properties_to_pm
[ 0.000470][DBG] fe80::a00:27ff:fe63:5b51%em0 is a link-local address, skipping
[ 0.000481][DBG] send_properties_to_pm: no match
[ 0.000489][DBG] Added endpoint "192.168.20.40@em1" to PM request
[ 0.000495][DBG] send_properties_to_pm: no match
[ 0.000508][DBG] fe80::1%lo0 is a link-local address, skipping
[ 0.000513][DBG] send_properties_to_pm: no match
[ 0.000522][DBG] nt_json_send_once
[ 0.000568][DBG] neat_get_event_loop
Starting event loop...
[ 0.000575][DBG] neat_start_event_loop
[ 0.000582][DBG] on_pm_error
[ 0.000586][DBG] on_pm_error
[ 0.000590][DBG] ===== Unable to communicate with PM, using fallback =====, error code = 1
[ 0.001754][DBG] open_resolve_cb
[ 0.001830][DBG] Address found
[ 0.001849][DBG] nt_he_open
[ 0.001854][DBG] HE Candidate 0: em1 [ 2] SCTP/IPv4 <saddr 192.168.20.40> <dstaddr 192.168.10.10> port 12327 priority 0
[ 0.001874][DBG] nt_find_multistream_socket
[ 0.001878][DBG] nt_find_multistream_socket - 0x802659000 : skipping - self...
[ 0.001883][DBG] nt_wait_for_multistream_socket
[ 0.001887][DBG] nt_wait_for_multistream_socket - 0x802659000 : skipping - self...
[ 0.001892][DBG] HE will now commence
[ 0.001898][DBG] on_he_connect_req
[ 0.001903][DBG] nt_connect
[ 0.001927][INF] nt_connect: Bind fd 14 to 192.168.20.40
[ 0.001947][DBG] SCTP stream negotiation - offering : 123 in / 123 out
[ 0.002489][DBG] on_he_connect_req: Connect successful for fd 14, ret = 0
[ 0.204932][DBG] he_connected_cb
[ 0.204960][DBG] Invokation count: 1 - flow: 0x802659000
[ 0.204965][DBG] HE Candidate connected: em1 [ 2] SCTP/IPv4 <saddr 192.168.20.40> <dstaddr 192.168.10.10> port 12327 priority 0
[ 0.205015][DBG] he_connected_cb - Connection status: 0 - No error: 0
[ 0.205021][DBG] First successful connect (flow->hefirstConnect)
[ 0.205026][DBG] send_result_connection_attempt_to_pm
[ 0.205042][INF] Sending HE result to PM for caching
[ 0.205047][DBG] nt_json_send_once_no_reply
[ 0.205080][DBG] uvpollable_cb
[ 0.205085][DBG] uvpollable_cb - awaiting notifications, socket not readable yet, skipping...
[ 0.205089][DBG] nt_update_poll_handle
[ 0.205094][DBG] nt_update_poll_handle - events - starting poll - readable : 1 - writable : 0
[ 0.205099][DBG] uvpollable_cb - finished
[ 0.205122][DBG] on_pm_error
[ 0.205142][DBG] on_pm_he_error
[ 0.205146][DBG] Unable to communicate with PM, error code = 1
[ 0.205161][DBG] uvpollable_cb
[ 0.205166][DBG] uvpollable_cb - awaiting notifications
[ 0.205170][DBG] io_readable
[ 0.205194][DBG] io_readable - got SCTP_RCVINFO
[ 0.205198][DBG] io_readable - Received 22 bytes on SCTP stream 0
[ 0.205203][INF] SCTP event notification
[ 0.205208][DBG] handle_sctp_event
[ 0.205213][DBG] handle_sctp_assoc_change
[ 0.205217][DBG] handle_sctp_assoc_change - state : SCTP_COMM_UP
[ 0.205221][DBG] handle_sctp_assoc_change - supported features
[ 0.205226][DBG] - PR
[ 0.205244][DBG] - MULTIBUF
[ 0.205249][DBG] nt_update_poll_handle
[ 0.205253][DBG] nt_update_poll_handle - events - starting poll - readable : 1 - writable : 0
[ 0.205271][DBG] uvpollable_cb - finished
Received signal
Signal received: SIGINT
Stop event loop in signal handler...
[ 67.607843][DBG] neat_stop_event_loop
[ 67.613428][DBG] neat_free_ctx
[ 67.613454][DBG] nt_remove_event_cb
[ 67.613459][INF] nt_remove_event_cb - Removed callback for type 0
[ 67.613463][DBG] nt_remove_event_cb
[ 67.613467][INF] nt_remove_event_cb - Removed callback for type 2
[ 67.613484][DBG] nt_close_socket
[ 67.613553][DBG] nt_free_flow
[ 67.613562][INF] nt_free_flow - removing 0x802659000
[ 67.613568][DBG] nt_free_candidates
[ 67.613574][DBG] nt_free_candidate
[ 67.613579][DBG] nt_free_candidate: Handle used by flow, flow should release it
[ 67.613584][DBG] nt_free_flow - closing handle and waiting for socket_handle_free_cb
[ 67.613591][DBG] nt_walk_cb
[ 67.613615][DBG] nt_walk_cb - closing handle
[ 67.613621][DBG] nt_walk_cb
[ 67.613626][DBG] nt_walk_cb - closing handle
[ 67.613637][DBG] nt_walk_cb
[ 67.613642][DBG] nt_walk_cb - handle->type == UV_IDLE - skipping
[ 67.613646][DBG] nt_walk_cb
[ 67.613651][DBG] nt_walk_cb
[ 67.613660][DBG] synchronous_free
[ 67.613666][DBG] synchronous_free - neat_resolver_free_results
[ 67.613770][INF] nt_log_close - closing logfile ...
Left the event loop...
ERROR: Client 1 failed to connect
Start time: Sun Feb 4 20:18:01 2018
End time: Sun Feb 4 20:19:08 2018
Elapsed time: 67.615185 s
Average throughput: 0.000000 Mbit/s

As can be seen from the client log, the only SCTP event notification received is the SCTP_ASSOC_CHANGE event. No SCTP_ADAPTATION_INDICATION event is received, etc.

@fredhau fredhau reopened this Feb 4, 2018
@marceloleitner
Copy link

When running tneat as a server on Linux as follows:
....
The Linux machine ran Ubuntu 16.04.1 LTS 4.13.0-32-generic.
...
As can be seen from the client log, the only SCTP event notification received is the SCTP_ASSOC_CHANGE event. No SCTP_ADAPTATION_INDICATION event is received, etc.

You're at risk of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7906b00f5cd1cd484fced7fcda892176e3202c8a
v4.13.3 or newer should contain this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants