Skip to content

netstack: possible compat issues or malfunctions of non-standard TCP Timestamp Options #11536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LionNatsu opened this issue Mar 10, 2025 · 10 comments
Labels
type: bug Something isn't working

Comments

@LionNatsu
Copy link

LionNatsu commented Mar 10, 2025

Description

<ACK> dropped silently. A compatibility issue?

What we expects:

sequenceDiagram
	participant c as curl
	participant s as Server(gVisor)
	Note over s: [LISTEN]
	c ->> s: SYN (TSopt)
	s ->> c: SYN+ACK (TSopt)
	Note over s: [SYN-RCVD]
	c ->> s: ACK (TSopt)
	Note over s: [ESTAB]
Loading

What happened:

sequenceDiagram
	participant c as curl
	participant s as Server(gVisor)
	Note over s: [LISTEN]
	c ->> s: SYN (TSopt)
	s ->> c: SYN+ACK (TSopt)
	Note over s: [SYN-RCVD]
	c -x s: ACK (no TSopt)
	Note over s: retransmission back-off 1s...
	s ->> c: SYN+ACK (TSopt, retransmits)
	c -x s: ACK (no TSopt, dup)
	Note over s: retransmission back-off 2s...
	s ->> c: SYN+ACK (TSopt, retransmits)
	c -x s: ACK (no TSopt, dup)
	Note over s: retransmission back-off 4s...
Loading
  1. Client (e.g. curl) established a TCP connection one-side (connect syscall completed). However, Server enabled gVisor is still in SYN-RCVD state, waiting for the <ACK> (But we have confirmed the packet has arrived into the gVisor network namespace).
  2. We found the problem only happens when the client side is using old versions of Linux kernel (e.g. 3.x ~ 4.14.x). New versions (e.g. 5.4.x ~ latest) of Linux kernel establish two-way connections with gVisor Server with no problem. It turns out old kernels does not fully comply to the RFC 7323 "TCP Extensions for High Performance" Section 3.2 - Timestamp Options says Once TSopt has been successfully negotiated, that is both <SYN> and <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST> segment for the duration of the connection.
  3. On the other hand, gVisor dropped the segments silently. We think it probably because RFC 7323 also states that If a non-<RST> segment is received without a TSopt, a TCP SHOULD silently drop the segment. Native Linux TCP/IP doesn't do this, no matter old or new version.

Workarounds:

  • Upgrading clients to new OS, new kernel (sounds crazy but we did do this, because all those our "clients" are micro-service containers running over few host server, which are easy to migrate).
  • Or, disabling the TCP Timestamp feature on client-side completely sysctl net.ipv4.tcp_timestamp=0.
  • Or, using host network mode.

Q1: Is this an intentional design decision?

But only partially. Malfunctions?

  1. Yet, suppose that is working-as-designed. The last weird thing is, even gVisor silently drops those non-standard ACK-of-SYN segments, the connections seem to be stuck in the backlog queue of accpet syscall and forever. It leads to a very strange phenomenon. Let's say the server called listen(s, 5). Client will keep failing until, suddenly, the 6th attempt of connect succeeded. It becomes completely normal, once and for all, as for this listener.
  2. We soon realise it is because SYN-cookies kick in. The backlog becomes full, and it switched to SYN-cookies. Starting from now, TCP Options in every handshake will be ignored, and so is the TSopt things.
  3. Even weirder, after leaving the server alone without any more request a whole night, we thought the half-established connections should have been all timed out and discarded, and new coming connection will be blocked again, but it didn't. The first 5 (certainly lost) connections stuck forever, forcing all the other connections use SYN-cookie and possibly degrade the performance.

Q2: Is there something missing when half-established connection (SYN+ACK retransmission) failed? For example, removing the failed endpoint from the queue?

Environment is the same as #11535

@LionNatsu LionNatsu added the type: bug Something isn't working label Mar 10, 2025
@LionNatsu
Copy link
Author

cc: @JiaHuann @milantracy

@LionNatsu
Copy link
Author

ACK-of-SYN is sliently dropped here:

// If the timestamp option is negotiated and the segment does
// not carry a timestamp option then the segment must be dropped
// as per https://tools.ietf.org/html/rfc7323#section-3.2.
if h.ep.SendTSOk && !s.parsedOptions.TS {
h.ep.stack.Stats().DroppedPackets.Increment()
return nil
}

(The comment is not entirely correct, as the RFC states that it is not “MUST”, but “SHOULD”)

@hbhasker
Copy link
Contributor

Fly by , yes this was an intentional decision based on the RFC. I probably missed checking the Linux implementation on whether it implements the specific part of the RFC.

@kevinGC FYI.

@kevinGC
Copy link
Collaborator

kevinGC commented Mar 11, 2025

Hey Bhasker, @nybidari leads netstack now.

Doing what Linux does -- which is apparently just not caring whether a timestamp is returned -- seems like a reasonable way to support broken clients like this. WDYT?

@hbhasker
Copy link
Contributor

Agreed. Congrats @nybidari ! Yes matching linux behaviour seems correct.

@nybidari
Copy link
Contributor

nybidari commented Mar 11, 2025

Thanks Kevin and Bhasker.

For Q1:
Yes, we silently drop the ACKs if the timestamp option has been negotiated and the ACKs do not have timestamp option.

For Q2:
The pendingEndpoints are deleted:

  • when there is an error
  • when the endpoints are shutdown/closed.
    In this case, we silently drop the ACKs and do not return error, which is why the pendingEndpoints are not cleaned up.

I can work on this in the next one or two weeks. We also welcome any PRs to fix this issue :)

@JiaHuann
Copy link

JiaHuann commented Mar 12, 2025

Thanks Kevin and Bhasker.

For Q1:
Yes, we silently drop the ACKs if the timestamp option has been negotiated and the ACKs do not have timestamp option.

For Q2:
The pendingEndpoints are deleted:

  • when there is an error
  • when the endpoints are shutdown/closed.
    In this case, we silently drop the ACKs and do not return error, which is why the pendingEndpoints are not cleaned up.

I can work on this in the next one or two weeks. We also welcome any PRs to fix this issue :)

Yes, I am working on this. But I haven't find any tutorial for building from source with debug info. Can you give some advice for that? I am not good at bazel.

I followed this page. https://gvisor.dev/docs/user_guide/debugging/

make dev BAZEL_OPTIONS="-c dbg --define gotags=debug"

And the runsc report unknown platform for systrap.

Thanks. ;)

@JiaHuann
Copy link

Forget about it, I just built gVisor with debug info successfully. Now testing the fix to the issue.

JiaHuann added a commit to JiaHuann/gvisor that referenced this issue Mar 14, 2025
When a handshake times out, the endpoint may remain stuck in the lEP's pendingEndpoints queue.

Related to google#11536
JiaHuann added a commit to JiaHuann/gvisor that referenced this issue Mar 14, 2025
When a handshake times out, the endpoint may remain stuck in the lEP's pendingEndpoints queue.

Related to google#11536
JiaHuann added a commit to JiaHuann/gvisor that referenced this issue Mar 14, 2025
When a handshake times out, the endpoint may remain stuck in the lEP's pendingEndpoints queue.

Related to google#11536
@LionNatsu
Copy link
Author

Thanks, @nybidari, for your clarification!

For Q1: Yes, we silently drop the ACKs if the timestamp option has been negotiated and the ACKs do not have timestamp option.

That makes sense. However, do we have any plan to match or shift to Linux's behaviour in this case, or would you rather stick to the current one?

@LionNatsu
Copy link
Author

Okay, I saw @JiaHuann submit a pendingEndpoints fix in #11557; thank you, JiaHuann 8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants