-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP flow deadlock: receive window closes and never opens again #340
Comments
Thanks very much for this thoughtful and well-written report, @mfp . I'll have a closer look at this later today. Do you have code for reproducing this, or did you observe it in some running system? |
The issue is very easily reproduced once you know what you're looking for, all it takes is a consumer that's slower than the sender. I patched mirage-tcpip trivially as per https://gist.github.com/mfp/c19f41c98d6ad28fab2223f567dba1e3 to dump more TCP window info on ACK reception, and the output looks as follows -- the lines for the server and client are intermingled, but you can see the server's TX window decreasing quickly until there's no further transmission.
Both commenting out the line that resizes the window in |
Thanks again for this report and for the repro code. I've adapted your gist (crediting you with the initial commit) and mashed it into the test suite at https://github.com/yomimono/mirage-tcpip/tree/deadlock_test . Please let me know if you don't want this code included in the repository here. I tend to agree that the best quick solution for the moment is to set a lower bound on @mfp would you be amenable to taking the |
I settled with a simpler expression for the lower bound (I originally had Tangentially, I also have a trivial patch to set the TCP window scale and initial congestion window ( |
Thank you!
Code that uses the |
I see. What about trivial getter/setters then (leaving it up to the application to set the values)? A more elegant way would be to have the moral equivalent of a typed sysctl provided somewhere in Mirage, but that requires a lot more work (deciding where it goes, coordinating more libs, etc.) |
Yeah, that's much friendlier, and what I was about to come back to suggest. FWIW, we do have a facility for getting values set by the user at compile-time or runtime into Mirage unikernels already, but it requires some coordination between the |
Thanks for the bug report, I'm sorry to be not too familiar with this code base to be able to evaluate it.
Yes, this work is likely needed at some point. I'd be much in favour of someone tackling this problem than a very isolated solution just for the current use case. My current ideas would be either using plan9 - we have an implementation in OCaml - or SNMP, but I'm really open to suggestions in that area. SNMP is already widely deployed with lots of tooling available (for monitoring, setting values, etc.), and can be used to both read and write values (as well as triggering traps). With plan9 I don't have any experience. |
I have observed the following situation which results in a flow deadlock:
This is a situation akin to the one prevented by the TCP persist timer (which I don't see implemented BTW), but it happens in absence of packet loss, and even when the window is not exactly 0.
AFAICS in the situation where the
User_buffer.Rx
becomes full, the receive window is about 0, the sender stops and there are no unACK'ed segments in theSegment.Rx
queue (or maybe when there are some but they are quickly depleted when data is consumed and some space left, with no net receive window increase), there is no mechanism to let the receiver inform the sender that its receive window has increased as the data is consumed by the application.I've verified that the deadlock does not happen (obviously) if I comment out the
Window.set_rx_wnd
call inUser_buffer
, or if I changeset_rx_wnd
to make sure it is never smaller than a few MSS (with some care to cope with rounding and window scaling).Note that flow control does work even if the receive window is not decreased as the user buffer fills, since the unACKed data would accumulate in
Segment.Rx
and eventually (if the application data is not read fast enough) reach the window size. The first effect would be that the amount of memory used on the receiver is twice the initial (and unmodified) receive window (once for the application data, and 2x for the unACKed data held in theSegment.Rx
queue).If the second workaround is applied, the extra amount of memory is bounded by a few MSS, so that'd seem a priori a workable quick fix.
If I'm reading
segment.ml
correctly, we can also prevent deadlocks in (ACK) packet loss scenario easily: the extra unACKed data ensures at least one ACK will be sent as the user buffer depletes, and if it's lost, the retransmission timer would trigger a resend that could be met by an ACK (force_ack = true
inmirage-tcpip/src/tcp/segment.ml
Line 151 in 4bb3820
force_ack
thus undoing partially #08e8692 for that specific case.Edit: I've realized this would also prevent the silly window syndrome, so I wonder: is there a reason I can't see (it's late...) why there shouldn't be a non-zero lower bound for the receive window, or iow. is there any problem in always having a little extra unACKed data around even when the receive buffer is full? I can see it'd cause a few extra packet retransmissions every (at least) 667ms if the application data consumption speed is exceedingly slow (the computed RTT would keep increasing though as well as the retransmission timer period, thus limiting overhead).
The text was updated successfully, but these errors were encountered: