Issue with new multiprocess supervisor with Django app in Google Cloud Run #2399

brianglass · 2024-07-26T13:41:41Z

brianglass
Jul 26, 2024

I am currently monitoring an issue with a Django app in Cloud Run. After upgrading from Uvicorn 0.29.0 to 0.30.3 (no other significant changes), I started seeing much heavier CPU usage and poor response times. Looking at the logs, it seemed that worker processes were dying and being restarted and dying repeatedly. I deployed a new version of the app with a monkey-patched version of uvicorn.supervisors.multiprocess.Process that adds some logging. What I found is that the self.process.is_alive() test passes, but the subsequent ping fails. I added additional logging to indicate when the always_pong thread gets started. It seems to get started the first time around, but then on subsequent worker launches it does not.

I'm still not sure what is going on, but my theory is that at some point, a worker fails to ping. This may be because of the way Cloud Run manages CPU by allocating it only during the request/response cycle. Once a worker fails to ping, Uvicorn gets stuck in a kill/restart loop for that worker slot (I haven't confirmed that it is always the same slot).

If I downgrade Uvicorn to 0.29.0, the problem goes away.

I don't have an answer or solution at this point, but am interested in suggestions from others and also just wanted to make folks aware that there is a potential issue.

The application at issue is in a public GH repo. The server startup code and monkey-patched Process class is here: https://github.com/brianglass/orthocal-python/blob/main/server.py.

brianglass · 2024-07-26T17:21:40Z

brianglass
Jul 26, 2024
Author

I'm fairly convinced this is because of the way Cloud Run allocates CPU only during the request/response cycle. I've done timings on the ping call, which is set to timeout at 5 seconds, and have seen it take as long as 27 seconds to return.

I can work around this for now by monkey patching the Process class and disabling the ping/pong.

I think that perhaps the best option for folks using Cloud Run and similar environments is to add some configurability to Uvicorn for the multiprocess supervisor. It would be helpful to be able to disable the ping/pong and/or to be able to set the timeout.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with new multiprocess supervisor with Django app in Google Cloud Run #2399

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Issue with new multiprocess supervisor with Django app in Google Cloud Run #2399

brianglass Jul 26, 2024

Replies: 1 comment

brianglass Jul 26, 2024 Author

brianglass
Jul 26, 2024

brianglass
Jul 26, 2024
Author