Issue with new multiprocess supervisor with Django app in Google Cloud Run #2399
Unanswered
brianglass
asked this question in
Potential Issue
Replies: 1 comment
-
I'm fairly convinced this is because of the way Cloud Run allocates CPU only during the request/response cycle. I've done timings on the ping call, which is set to timeout at 5 seconds, and have seen it take as long as 27 seconds to return. I can work around this for now by monkey patching the Process class and disabling the ping/pong. I think that perhaps the best option for folks using Cloud Run and similar environments is to add some configurability to Uvicorn for the multiprocess supervisor. It would be helpful to be able to disable the ping/pong and/or to be able to set the timeout. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently monitoring an issue with a Django app in Cloud Run. After upgrading from Uvicorn 0.29.0 to 0.30.3 (no other significant changes), I started seeing much heavier CPU usage and poor response times. Looking at the logs, it seemed that worker processes were dying and being restarted and dying repeatedly. I deployed a new version of the app with a monkey-patched version of
uvicorn.supervisors.multiprocess.Process
that adds some logging. What I found is that theself.process.is_alive()
test passes, but the subsequent ping fails. I added additional logging to indicate when thealways_pong
thread gets started. It seems to get started the first time around, but then on subsequent worker launches it does not.I'm still not sure what is going on, but my theory is that at some point, a worker fails to ping. This may be because of the way Cloud Run manages CPU by allocating it only during the request/response cycle. Once a worker fails to ping, Uvicorn gets stuck in a kill/restart loop for that worker slot (I haven't confirmed that it is always the same slot).
If I downgrade Uvicorn to 0.29.0, the problem goes away.
I don't have an answer or solution at this point, but am interested in suggestions from others and also just wanted to make folks aware that there is a potential issue.
The application at issue is in a public GH repo. The server startup code and monkey-patched
Process
class is here: https://github.com/brianglass/orthocal-python/blob/main/server.py.Beta Was this translation helpful? Give feedback.
All reactions