You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Client should wait for maximum client._session_timeout / 1000 / len(client.hosts) seconds before moving to next zookeeper server.
Actual Behavior
We are using kazoo to connect each of our clients to one of the 3 zookeeper servers that form a cluster. We set timeout=120s when we instantiate kazoo.client.KazooClient class. Therefore, client._session_timeout = 120000.
Below is the code from kazoo that sets client._session_timeout:
self._session_timeout = int(timeout * 1000)
Our purpose is to have negotiated session timeout=120s and connection_timeout=120s/3=40s. In some places of kazoo code it is timeout=client._session_timeout / 1000.0, while in some other places it is timeout=client._session_timeout / 1000.0 / len(client.hosts).
In all our zookeeper servers, in /conf/zoo.cfg, we have set maxSessionTimeout=120s.
After negotiation, indeed, negotiated_session_timeout is 120s.
We add an iptables rule and we isolate zookeeper leader. The remaining 2 zookeeper servers close the sessions with all the clients and perform leader election. From what I have read, this is the behavior of zookeeper servers when the leader goes down.
kazoo tries to connect to next zookeeper server from the local server list (we shuffle zookeeper server list). Sometimes, kazoo will try to connect to the isolated zookeeper server.
When kazoo tries to connect to any zookeeper server, this code will be called:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0,
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
As we can see, timeout is 120s. kazoo waits for 120s and then moves to another zookeeper server. Our expectation is to wait for 40s.
I have added additional logs in protocol/connection.py, which start with 'anda:'
Here I have used dummy values as session ids/passwords.
Logs with logging in DEBUG mode
07/08/2024 03:12:51 AM Using session_id: XXXX session_passwd: YYYY 07/08/2024 03:12:51 AM anda: before create_connection
07/08/2024 03:14:51 AM Connection dropped: socket connection error: None
07/08/2024 03:14:51 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/08/2024 03:14:51 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/08/2024 03:14:51 AM Connecting to :port, use_ssl: False
07/08/2024 03:14:51 AM Using session_id: 0123 session_passwd: VVVV
07/08/2024 03:14:51 AM anda: before create_connection
07/08/2024 03:14:51 AM anda: after create_connection
07/08/2024 03:14:51 AM anda: after Connect
07/08/2024 03:14:51 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=1234, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=None)
07/08/2024 03:14:51 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=False)
07/08/2024 03:14:51 AM anda: after _invoke
07/08/2024 03:14:51 AM Session created, session_id: ZZZZ session_passwd: AAAA
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0
Logs when I divide timeout value to len(client.hosts). Below is the modified code:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port), timeout=client._session_timeout / 1000.0 / len(client.hosts),
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
07/09/2024 04:10:00 AM Connection dropped: socket connection broken
07/09/2024 04:10:00 AM Transition to CONNECTING
07/09/2024 04:10:00 AM Zookeeper connection lost
07/09/2024 04:10:00 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:00 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/09/2024 04:10:00 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:00 AM Using session_id: 1111 session_passwd: 1234 07/09/2024 04:10:00 AM anda: before create_connection
07/09/2024 04:10:40 AM Connection dropped: socket connection error: None
07/09/2024 04:10:40 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:40 AM anda: in _connect_loop before _connect_attempt host = =
07/09/2024 04:10:40 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:40 AM Using session_id: 1234 session_passwd: AAAA
07/09/2024 04:10:40 AM anda: before create_connection
07/09/2024 04:10:40 AM anda: after create_connection
07/09/2024 04:10:40 AM anda: before setblocking
07/09/2024 04:10:40 AM anda: after setblocking
07/09/2024 04:10:40 AM anda: after Connect
07/09/2024 04:10:40 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=3333, time_out=120000, session_id=1234, passwd='AAAA', read_only=None)
07/09/2024 04:10:40 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=1234, passwd='BBBB', read_only=False)
07/09/2024 04:10:40 AM anda: after _invoke
07/09/2024 04:10:40 AM Session created, session_id: 1234 session_passwd: GGGG
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0
07/09/2024 04:10:40 AM Zookeeper connection established, state: CONNECTED
Specifications
Kazoo version: 2.7.0
Zookeeper version: 3.8.4
Zookeeper configuration: put here any useful ZK configuration (authentication, encryption, number of ZK members, number of (concurrent?) clients, Java version, krb5 version, etc.)
Python version: Python 2.7
OS: RHOSP 8.4
The text was updated successfully, but these errors were encountered:
Hi there, thanks for reporting this.
This is, in fact, a known issue and we are already working on a fix for it.
See #685 which dissociates network connection timeouts from the logical
session timeout.
Cheers,
Expected Behavior
Client should wait for maximum client._session_timeout / 1000 / len(client.hosts) seconds before moving to next zookeeper server.
Actual Behavior
We are using kazoo to connect each of our clients to one of the 3 zookeeper servers that form a cluster. We set timeout=120s when we instantiate kazoo.client.KazooClient class. Therefore, client._session_timeout = 120000.
Below is the code from kazoo that sets client._session_timeout:
self._session_timeout = int(timeout * 1000)
Our purpose is to have negotiated session timeout=120s and connection_timeout=120s/3=40s. In some places of kazoo code it is timeout=client._session_timeout / 1000.0, while in some other places it is timeout=client._session_timeout / 1000.0 / len(client.hosts).
In all our zookeeper servers, in /conf/zoo.cfg, we have set maxSessionTimeout=120s.
After negotiation, indeed, negotiated_session_timeout is 120s.
We add an iptables rule and we isolate zookeeper leader. The remaining 2 zookeeper servers close the sessions with all the clients and perform leader election. From what I have read, this is the behavior of zookeeper servers when the leader goes down.
kazoo tries to connect to next zookeeper server from the local server list (we shuffle zookeeper server list). Sometimes, kazoo will try to connect to the isolated zookeeper server.
When kazoo tries to connect to any zookeeper server, this code will be called:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0,
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
As we can see, timeout is 120s. kazoo waits for 120s and then moves to another zookeeper server. Our expectation is to wait for 40s.
I have added additional logs in protocol/connection.py, which start with 'anda:'
Here I have used dummy values as session ids/passwords.
Logs with logging in DEBUG mode
07/08/2024 03:12:51 AM Using session_id: XXXX session_passwd: YYYY
07/08/2024 03:12:51 AM anda: before create_connection
07/08/2024 03:14:51 AM Connection dropped: socket connection error: None
07/08/2024 03:14:51 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/08/2024 03:14:51 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/08/2024 03:14:51 AM Connecting to :port, use_ssl: False
07/08/2024 03:14:51 AM Using session_id: 0123 session_passwd: VVVV
07/08/2024 03:14:51 AM anda: before create_connection
07/08/2024 03:14:51 AM anda: after create_connection
07/08/2024 03:14:51 AM anda: after Connect
07/08/2024 03:14:51 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=1234, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=None)
07/08/2024 03:14:51 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=ZZZZ, passwd="TTTT", read_only=False)
07/08/2024 03:14:51 AM anda: after _invoke
07/08/2024 03:14:51 AM Session created, session_id: ZZZZ session_passwd: AAAA
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0
Logs when I divide timeout value to len(client.hosts). Below is the modified code:
with self._socket_error_handling():
self._socket = self.handler.create_connection(
address=(hostip, port),
timeout=client._session_timeout / 1000.0 / len(client.hosts),
use_ssl=self.client.use_ssl,
keyfile=self.client.keyfile,
certfile=self.client.certfile,
ca=self.client.ca,
keyfile_password=self.client.keyfile_password,
verify_certs=self.client.verify_certs,
)
07/09/2024 04:10:00 AM Connection dropped: socket connection broken
07/09/2024 04:10:00 AM Transition to CONNECTING
07/09/2024 04:10:00 AM Zookeeper connection lost
07/09/2024 04:10:00 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:00 AM anda: in _connect_loop before _connect_attempt host = hostip =
07/09/2024 04:10:00 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:00 AM Using session_id: 1111 session_passwd: 1234
07/09/2024 04:10:00 AM anda: before create_connection
07/09/2024 04:10:40 AM Connection dropped: socket connection error: None
07/09/2024 04:10:40 AM anda: in _connect_loop after _connect_attempt host = hostip =
07/09/2024 04:10:40 AM anda: in _connect_loop before _connect_attempt host = =
07/09/2024 04:10:40 AM Connecting to :port, use_ssl: False
07/09/2024 04:10:40 AM Using session_id: 1234 session_passwd: AAAA
07/09/2024 04:10:40 AM anda: before create_connection
07/09/2024 04:10:40 AM anda: after create_connection
07/09/2024 04:10:40 AM anda: before setblocking
07/09/2024 04:10:40 AM anda: after setblocking
07/09/2024 04:10:40 AM anda: after Connect
07/09/2024 04:10:40 AM Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=3333, time_out=120000, session_id=1234, passwd='AAAA', read_only=None)
07/09/2024 04:10:40 AM Read response Connect(protocol_version=0, last_zxid_seen=0, time_out=120000, session_id=1234, passwd='BBBB', read_only=False)
07/09/2024 04:10:40 AM anda: after _invoke
07/09/2024 04:10:40 AM Session created, session_id: 1234 session_passwd: GGGG
negotiated session timeout: 120000
connect timeout: 40000
read timeout: 80000.0
07/09/2024 04:10:40 AM Zookeeper connection established, state: CONNECTED
Specifications
The text was updated successfully, but these errors were encountered: