-
Notifications
You must be signed in to change notification settings - Fork 2
Add timeout to connect attempts to fix #19 #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thomasjm 👋🏻
Thank you so much for reporting the issue you encountered and opening this PR to try and fix it! I hadn't run into this particular problem during any of my testing.
I approved the CI runs for your PR and it seems that most of the workflows fail because you introduced the NumericUnderscores
extension which isn't available in older versions of GHC.
I am on holiday this week, so it would be good if you could change that and get the existing CI to pass before I am back next week when I can review this in full.
Also, some of the builds fail because |
All right, done. I'm pretty sure nobody uses GHC 8.2 anymore though :P |
Friendly ping on review @mbg |
Hi @thomasjm, Sorry for the delay in getting this reviewed. It's on my radar, but I just haven't been able to find the time yet. |
Hi @mbg, any chance of getting this finished soon? |
Really sorry for the delay here. I haven't been doing much Haskell lately, so these libraries haven't been getting the attention they deserve. However, I am trying to get back into maintaining these properly since I am using them personally again, so I am getting back to this now. I have been trying to put together a test case for this (I have tried various combinations of
|
#19 would have been on Linux with TCP. What are you trying to do in your test case? Like I said in the original issue, I'm not entirely sure what circumstances cause a |
Mainly I've been trying to simulate that sort of thing, but with everything I have tested the connection attempt either got aborted immediately or the underlying networking stack's timeout worked. For example, I added a firewall rule (on macOS) with:
And then:
|
i just reproduced a hang on both Linux and macOS like this: module Main (main) where
import Network.Socket
main :: IO ()
main = do
sock <- socket AF_INET Stream defaultProtocol
-- Connect to 10.255.255.1 (non-routable private IP)
-- This will cause SYN packets to be sent but dropped by routing
let addr = SockAddrInet 80 (tupleToHostAddress (10, 255, 255, 1))
putStrLn "Attempting connection (this will hang)..."
connect sock addr -- This will hang indefinitely
putStrLn "Connected!" -- Never reached |
That's great, thank you! I can confirm that this does get stuck on its own, as well as on the
But with the changes in your PR:
|
I have turned this into a test case in 9d56f3d, feel free to cherry-pick that onto this branch |
FWIW if you wait long enough with my hanging example, it does time out. For me it took 1 minute 15 seconds on macOS, and 2 minutes 13 seconds on Linux. I'm not sure what determines this. |
Cherry-pick done |
Interesting. Is that specific to the example or is that the case with the original scenario where you ran into the problem as well? (If you can reproduce that) |
Also, I realised that 4d27fd7 is also needed for the test. Otherwise, without the fix, it of course also gets stuck (at least for the amount of time until the lengthy timeout kicks in that you mentioned). |
I'm not sure where it comes from. The original scenario was intermittent and difficult to reproduce since it involved a whole Kubernetes cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whether or not the test case gives us the worst case scenario of connect
getting completely stuck, this is an improvement over main
where there's no timeout for the connect
attempt and we rely on the underlying network stack to do the right thing.
I am happy to merge this as-is.
Thank you very much for reporting this, implementing a fix, and finding a reasonable test case. Once again sorry that it took so long for me to deal with.
I am releasing this as part of https://github.com/mbg/network-wait/releases/tag/v0.4 |
Thanks! |
Without connect timeouts, I get intermittent hangs on some tests I run. With this change, I ran a whole bunch of repeats and got no failures.
I don't have an exact reproducer for causing the connect call to hang. In my case, it happens when a Kubernetes Service is in the process of starting. According to some googling, hangs can happen for a variety of reasons, such as if the network simply drops packets for a destination that doesn't exist yet.