Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC async server looses track of _futures #1652

Open
peter-resnick opened this issue Mar 29, 2024 · 0 comments
Open

gRPC async server looses track of _futures #1652

peter-resnick opened this issue Mar 29, 2024 · 0 comments

Comments

@peter-resnick
Copy link

peter-resnick commented Mar 29, 2024

Hi MLServer -

To start off, this is an awesome tool and the team has impressive work to get to this point.

I'm currently using MLServer in a high-throughput, low-latency system where we use gRPC to perform inferences. We have added an asynchronous capability into our inference client which sends many requests to the gRPC server at once (typically about 25). We have a timeout set on our client and we first started seeing a number of DEADLINE_EXCEEDED responses and I started to look into the model servers themselves to figure out why the server had started to exceed deadlines (we hadn't experienced this very often in the past) and it looks like the process loop is actually being restarted due messages being lost.

We see the following traceback:

2024-03-28 19:56:42,015 [mlserver.parallel] ERROR - Response processing loop crashed. Restarting the loop...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 186, in _process_responses_cb
    process_responses.result()
  File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 207, in _process_responses
    self._async_responses.resolve(response)
  File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 102, in resolve
    future = self._futures[message_id]
KeyError: 'cea95af0-859f-413a-a033-dfbe51e96c05'

where the dispatcher is trying to check on a given message, but it's lost.
^ once this error occurs once, all of the rest of our parallel inference requests fail with the same exception (different message_id obviously).

I took a look at the source code and it looks like when the process_response.result() is called, the logic has a blanket exception for anything that isnt an asyncio.CancelledError and assume that the process loop has crashed, so it restarts it by scheduling a new task, but it's not immediately clear (to me, at least) if this is really what should be happening. I don't see any signals from the server that the processing loop actually crashed - it just seems to be confused about which message its supposed to be getting.

As a note about our system set up, we have these deployed into Kubernetes (so is our client app) as a deployment with between 10-15 pods at any given time with environment variable MLSERVER_PARALLEL_WORKERS=16.

We are also using a grpc.aio.insecure_channel(server) pattern to manage the gRPC interactions on the client side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant