You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here you can see that the litellm backend is automatically setting the token rate limit to 480,000, which is the number that is stored in x-ratelimit-limit-tokens in the headers.
However, the correct number is actually 80,000 which is stored in llm_provider-anthropic-ratelimit-output-tokens-limit
Also note there is llm_provider-anthropic-ratelimit-output-tokens-remaining provided as well.
Since we use 480,000 instead of 80,000, we quickly run into rate limit errors:
�[36m(_Completions pid=408316, ip=10.120.7.8)�[0m 2024-12-10 09:59:08,577 - bespokelabs.curator.request_processor.base_online_request_processor - WARNING - Request 20 failed with Exception litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization's rate limit of 80,000 output tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}, attempts left 5
The text was updated successfully, but these errors were encountered:
RyanMarten
changed the title
Automatic Rate Limit Detection For Anthropic Bug
Automatic Rate Limit Detection For Anthropic doesn't distinguish between input and output tokens
Dec 10, 2024
Here you can see that the litellm backend is automatically setting the token rate limit to
480,000
, which is the number that is stored inx-ratelimit-limit-tokens
in the headers.However, the correct number is actually
80,000
which is stored inllm_provider-anthropic-ratelimit-output-tokens-limit
Also note there is
llm_provider-anthropic-ratelimit-output-tokens-remaining
provided as well.Since we use
480,000
instead of80,000
, we quickly run into rate limit errors:The text was updated successfully, but these errors were encountered: