Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cannot get past 50 RPS #6592

Open
vutrung96 opened this issue Nov 5, 2024 · 5 comments
Open

[Bug]: Cannot get past 50 RPS #6592

vutrung96 opened this issue Nov 5, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@vutrung96
Copy link

vutrung96 commented Nov 5, 2024

What happened?

I have OpenAI tier 5 usage, which should give me 30,000 RPM = 500 RPS with "gpt-4o-mini". However I struggle get past 50 RPS.

The minimal replication:

from litellm import acompletion

tasks = [acompletion(
    model="gpt-4o-mini",
    messages=[
      {"role": "system", "content": "You're an agent who answers yes or no"},
      {"role": "user", "content": "Is the sky blue?"},
    ],
) for i in range(2000)]

I only get 50 items/second as opposed to ~500 items/second when sending raw HTTP requests.

Relevant log output

 16%|█████████████████████▌                                                                                                                 | 320/2000 [00:09<00:40, 41.49it/s]

Twitter / LinkedIn details

No response

@ishaan-jaff
Copy link
Contributor

hi @vutrung96 looking into this, how do you get the % complete log output ?

@vutrung96
Copy link
Author

Hi @ishaan-jaff I was just using tqdm

@CharlieJCJ
Copy link

Hi @ishaan-jaff , any updates on this, also facing this issue!

@ishaan-jaff
Copy link
Contributor

hi @vutrung96 @CharlieJCJ do you see the issue on litellm.router too ? https://docs.litellm.ai/docs/routing

It would help me if you could test with litellm router too

@RyanMarten
Copy link

Hi @ishaan-jaff
We tracked down the root cause of the issue.

Litellm uses the official OpenAI python client

client: Optional[Union[OpenAI, AsyncOpenAI]] = None,

The official OpenAI client has performance issues with high numbers of concurrent requests due to issues in httpx

The issues in httpx are due to a number of factors related to anyio vs asyncio

Which are addressed in the open PRs below

We saw this when implementing litellm as the backend for our synthetic data engine

When using our own openai client (with aiohttp instead of httpx) we saturate the highest rate limits (30,000 requests per minute on gpt-4o-mini tier 5). When using litellm, the performance issues cap us well under the highest rate limit (200 queries per second - 12,000 requests per minute).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants