Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit warning if ping socket select takes longer #632

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jarus
Copy link

@jarus jarus commented Dec 8, 2020

Why is this needed?

We experienced Zookeeper session timeouts within our applications with Kazoo because of GIL contention blocking the Kazoo handler thread and its ping loop.

Proposed Changes

  • Emitting a warning log message to indicate a longer sleep time of the handler/thread.

Does this PR introduce any breaking change?

No

kazoo/protocol/connection.py Outdated Show resolved Hide resolved
kazoo/protocol/connection.py Show resolved Hide resolved
@jarus jarus marked this pull request as draft December 8, 2020 21:21
We experienced Zookeeper session timeouts within our applications
with Kazoo because of GIL contention blocking the Kazoo thread and
its ping loop. The thread will now emit a warning in such a case.
@jarus jarus marked this pull request as ready for review December 9, 2020 07:16
@StephenSorriaux
Copy link
Member

StephenSorriaux commented Dec 13, 2020

Thank you for this PR.

I understand the need. I am wondering if this warning should (can?) be integrated in the select() method instead so that those kind of contentions can also be detected when trying to _read() or _write(). Any thoughts?

@jeffwidman
Copy link
Member

For the PR itself, timing this and logging a warning is probably a good idea, although @StephenSorriaux has a good question.

Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).

I just wonder if you're instead hitting something like the problem described in #633...

Again though, probably still a good idea to calculate/log the warning, regardless of the root cause.

@ceache
Copy link
Contributor

ceache commented Dec 13, 2020 via email

@jeffwidman
Copy link
Member

Nudge @jarus

@jarus
Copy link
Author

jarus commented Jan 27, 2021

Sorry for my delayed reply.

Although I'm curious, how do you know it's the GIL blocking you? Is this under Py2 or Py3? (the GIL scheduler changed slightly to reduce the odds of this kind of thing in Py3).

I know that most people blame the GIL for their application failures. Once I assumed something similar, I actually started to investigate a bit in that area and developed some tooling for measuring the GIL contention. If you are interested, you could take a look at my recent EuroPython talk about this topic: https://www.youtube.com/watch?v=HtbLNgXmLrw

I just wonder if you're instead hitting something like the problem described in #633...
I am wondering as well if this is not also addressed by #633

Yes, the new implementation looks promising. Until now, I sadly hadn't the chance to test it in our application.

How should we proceed? If you think the warning is still helpful, I would rebase and adjust the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants