Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Rate Limit Error #6890

Open
1 task done
Marius-Juston opened this issue Feb 22, 2025 · 5 comments
Open
1 task done

[Bug]: Rate Limit Error #6890

Marius-Juston opened this issue Feb 22, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@Marius-Juston
Copy link

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Describe the bug and reproduction steps

Currently on a Tier 2 Antropic API tier, which has a 80,000 input token per minute limit limit. So after a while I get he RateLimitError. The problem is that after 3/4 times it gives you the rate limit error it gives another Python error that does not get exception handled which causes the system to constantly locked in a "Agent is Rate Limited" state, with the only solution is to restart the instance of OpenHands.

So I think a couple things would be helpful.

  1. The ability to set a custom rate limit on the UI side for APIs that are rate limited (with the ability to set a refresh time)
  2. Add the ability to truncate the prompt input so that it helps against the rate limits and input token size
  3. Fix the "Agent is Rate Limited" infinite state

OpenHands Installation

Docker command in README

OpenHands Version

0.25

Operating System

Linux

Logs, Errors, Screenshots, and Additional Context

litellm.llms.anthropic.common_utils.AnthropicError: {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/openhands/controller/agent_controller.py", line 238, in _step_with_exception_handling
    await self._step()
  File "/app/openhands/controller/agent_controller.py", line 674, in _step
    action = self.agent.step(self.state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/openhands/agenthub/codeact_agent/codeact_agent.py", line 130, in step
    response = self.llm.completion(**params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/app/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/app/openhands/llm/llm.py", line 235, in wrapper
    resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1190, in wrapper
    raise e
  File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1068, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3085, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2202, in exception_type
    raise e
  File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 548, in exception_type
    raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
@mamoodi
Copy link
Collaborator

mamoodi commented Feb 24, 2025

@enyst just for my own understanding, I know Tier 1 always gets rate limited, but is Tier 2 not enough anymore? Or does it depend?

@erictbenson10
Copy link

+1 for bringing the rate limit setting to the UI

as a workaround, you can try with a higher retry window. wont solve the infinite hang but can reduce the likelihood of getting there.

docker run -it --rm --pull=always \ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.25-nikolaik \ -e LOG_ALL_EVENTS=true \ -e LLM_NUM_RETRIES=6 \ -e LLM_RETRY_MIN_WAIT=5 \ -e LLM_RETRY_MAX_WAIT=90 \ -e LLM_RETRY_MULTIPLIER=2 \ -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.openhands-state:/.openhands-state \ -p 3000:3000 \ --add-host host.docker.internal:host-gateway \ --name openhands-app \ docker.all-hands.dev/all-hands-ai/openhands:0.25

@enyst
Copy link
Collaborator

enyst commented Feb 24, 2025

Thank you for the report. Yes, it depends, and it's always possible to bump into those limits. I couldn't really run an eval with multiple processes, without a lot of pain, for example, and on a tier 3 account.

Thank you Eric for the command! That's exactly right, we can tweak those options. Docs for those options are here: https://docs.all-hands.dev/modules/usage/configuration-options#retrying

I am not sure what happens with the "infinite state" when RateLimitError is hit, though, sounds like a bug, it should just display in the UI, in real time I think?, that the agent is rate limited, while the LLM continues to retry. Cc: @raymyers

@Jaspann
Copy link

Jaspann commented Feb 25, 2025

I am getting this as well with Tier 1. I found a workaround though. With the new Claude 3.7 Sonnet, it does not seem to have a Input Tokens per Minute limit, so I have been using that instead without any problems so far.

EDIT: They added a Input Tokens per Minute limit, at 20,000 for Tier 1.

@manzke
Copy link

manzke commented Feb 25, 2025

I'm even getting this with Tier 3 ;) - the bigger your codebase is, the more has to be send. Especially if you build features which are across your application.
It feels amazing in the beginning, but wait til you start burning tokens. :)

https://github.com/manzke/rag-chat-interface build by openhands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants