-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): Adjust connection timeout and retry logic #685
base: master
Are you sure you want to change the base?
Conversation
4a89840
to
0170cdf
Compare
@python-zk/maintainers This is an early draft of fix to the connection logic. The original issue that sent me on this fix is as follow: zkc = KazooClient(hosts=["nonresponsive:2181", "goodhost:2181"], timeout=60)
zkc.start() On a new connection, it results in a painful delay of 60 seconds wasted on the bad server before trying the good one. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #685 +/- ##
==========================================
- Coverage 96.65% 94.40% -2.26%
==========================================
Files 27 58 +31
Lines 3554 8413 +4859
==========================================
+ Hits 3435 7942 +4507
- Misses 119 471 +352
... and 41 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
I am pondering if the below should be changed too. I really do not think connect_result, zxid = self._invoke(
client._session_timeout / 1000.0 / len(client.hosts), connect
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, in general—except possibly for the default initial cur_delay
. I have left a few comments.
kazoo/protocol/connection.py
Outdated
connect_result, zxid = self._invoke( | ||
client._session_timeout / 1000.0 / len(client.hosts), connect | ||
) | ||
connect_result, zxid = self._invoke(timeout, connect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that _session_timeout
, being the requested session timeout and thus potentially arbitrarily larger than the actual session timeout, is an unfortunate choice.
Thinking about it some more, I am not sure that using the retry's cur_delay
as-is is a good choice either. It may have been sufficient for the TCP (and potentially SSL) handshakes, but this request/response cycle involves a round trip through the JVM, disk logging, quorum packets (learner with global sessions), etc.
I can imagine many "false starts" (and thus gratuitous network & server activity) if the default cur_delay
is kept too low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, _authenticate_with_sasl
and AUTH_XID
below reuse connect_timeout
—which, at least, is related to the negotiated timeout—for "numerous" write and read operations which have to complete timely as they do not "touch" the session.
The good news is that the session is "fresh" at that point. But in theory, the negotiated timeout should be used to determine a deadline for the whole sequence, so that it aborts before causing a session expiry if e.g. SASL authentication proceeds too slowly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note that this is not handled in the other clients either.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the "session timeout" is an indication what you want (as you point out, before negotiation with the server's config) for how long your session will remain "alive" in case the client disconnects. In other words, how much time the client has to reconnect.
I have a more "radical" view than yours, I think, which is that it has nothing to do with how long it takes to establish TCP connection between the client and the servers.
The increasing backoff method is a good way to adapt to Zookeeper servers that can be deployed on the local network (~10s ms), across data centers (~100s ms) or across continents. The default value would be more fitting for the 2 first use cases and, yes, possibly less so for wider deployments, but anyway for such setups, it is possible to easily tweak that value.
That is my opinion for the TCP connection timeout.
For the requests timeout, which is once the TCP connection is established, this is something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For request/reply messages on an established TCP connection, including Connect
requests.
The original code used client._session_timeout / 1000.0 / len(client.hosts)
for the Connect (pre-negotiation).
After that, all other request/reply use these two, connect_timeout
for writes, read_timeout
for reads:
connect_timeout = negotiated_session_timeout / len(client.hosts)
read_timeout = negotiated_session_timeout * 2.0 / 3.0
My rational for changing the timeout on the 1st Connect
is that it is possible to have a healthy host (which succeeds to TCP connect) but an unhealthy JVM/Zookeeper that does not handle the Connect
request.
In such case, the client would wait a long time before timing out and moving to the next server, much of its total session timeout.
It feels that waiting arbitrary long when we know what was the timeout of a successful TCP is not ideal, especially when we have not yet selected a server and there could be others.
But I could be convinced that this should be part of a different change.
@@ -614,7 +614,9 @@ def _connect_attempt(self, host, hostip, port, retry): | |||
|
|||
try: | |||
self._xid = 0 | |||
read_timeout, connect_timeout = self._connect(host, hostip, port) | |||
read_timeout, connect_timeout = self._connect( | |||
host, hostip, port, timeout=retry.cur_delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks good, but the default initial _cur_delay
is 100ms. Do we really want to keep it that low?
b738347
to
fd57818
Compare
fd57818
to
d4a28f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high-level this makes complete sense and the implementation looks solid. Reality of course is that timeouts are always tricky, so we may need to further tweak once this is live in prod and folks start to see how it behaves.
c36c2a1
to
2295cbc
Compare
2295cbc
to
c634e52
Compare
Do not use the session timeout as connection timeout. This value is too large and results in a bad non-responsive server holding up the client long enough for the session to timeout. Use the KazooRetry object to use an increasing backoff timeout and cycle over all servers quickly, working around bad servers with minimal impact.
c634e52
to
e34e407
Compare
Just like the others, I think the changes (+ explanations) make a lot of sense. Although, just like @ztzg commented, the new default value might seem a little low from what the previous was (100ms vs 10s). I like the changes making the client less lazy but I am not sure if that kind of default value would suit to every current possible uses of the client and, if possible, I would prefer we try not to change it. |
Hi @StephenSorriaux, all, It would be great to have this fixed in master, as teams using unpatched versions of Kazoo regularly experience "outages."
Note that 10s is (supposed to be) the session lifetime, and that by using it as a connection timeout, the session is dead by the time any unresponsive TCP/IP operation aborts. Which is… suboptimal! I would not have dared 100ms, but agree with @ceache that the backoff should cause clients to quickly adapt to the actual network. In any case, I vote for it over repeated puzzled users :)
As Jeff wrote, "timeouts are always tricky"—but I don't understand why we would want to keep using this value, as it basically guarantees that the session is… dead… by the deadline. Am I missing something? It seems that the default value should ideally be below (lifetime / n_hosts), to give the client a chance to try each host once—and even lower if we want to allow multiple "rounds." For the record, the Java client initially sets the TCP/IP connect timeout to (requested_lifetime / n_hosts), and ajusts it to (negotiated_lifetime / n_hosts) once the negotiated lifetime is known—but it does not apply backoff.
This would alleviate the need for a patched Kazoo, but would IMHO still be a suboptimal default. Cheers, -D |
Hi all. Gentle reminder :) |
Why is this needed?
Do not use the session timeout as connection timeout. This value is too large and results in a bad non-responsive server holding up the client long enough for the session to timeout.
Proposed Changes
Does this PR introduce any breaking change?
This change should be transparent at a high level.
It makes the client connection attempts more aggressive, on purpose.