Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ratelimit and retry #87

Open
ccorcos opened this issue Nov 6, 2024 · 2 comments
Open

Handle ratelimit and retry #87

ccorcos opened this issue Nov 6, 2024 · 2 comments

Comments

@ccorcos
Copy link

ccorcos commented Nov 6, 2024

I'm trying with a big pdf and getting a ratelimit error:

    data: {
      error: {
        message: 'Rate limit reached for gpt-4o-mini in organization org-zt8czUs8Thn5aBD4n2Jf1kve on tokens per min (TPM): Limit 200000, Used 198414, Requested 2829. Please try again in 372ms. Visit https://platform.openai.com/account/rate-limits to learn more.',
        type: 'tokens',
        param: null,
        code: 'rate_limit_exceeded'
      }
    }
  },
  status: 429
}

It would be nice if (1) this library would handle those rate limits and (2) it would return intermediate results or something so that I can save results of past pages and keep going when thigns crash or error like this.

Thanks for the help!

@tylermaran
Copy link
Contributor

Hey @ccorcos. Good suggestion.

I'd have to think through the ideal response object here. It would need to return some sort of error code as well to let you know that it was giving back a partial response.

@ccorcos ccorcos closed this as completed Nov 14, 2024
@ccorcos ccorcos reopened this Nov 14, 2024
@igorlima
Copy link

You're facing a rate limit error with OpenAI because you exceeded the token limit. This differs from the Gemini free tier, with a rate limit per request. I've worked around this issue on the Gemini model by controlling the number of pages processed per minute.

Below is a brief overview of what I've done to manage the rate limit with Gemini. You can apply a similar approach by estimating the average number of tokens per page for the OpenAI model:

  • wait-for module snippet
    # https://docs.python.org/3/library/asyncio.html
    # https://docs.python.org/3/library/asyncio-task.html
    import  asyncio
    
    async def wait_for_rate_limit(zerox_fn, number_of_pages, num_page_per_rate_limit=3, rate_limit_seconds=60):
    
      async def rate_limit():
        print('started counting rate limit usage ...')
        await asyncio.sleep(rate_limit_seconds)
        print('... rate limit refreshed!')
        return True
      
      def wait_for(num_pages, task_conversion, task_rate_limit):
        def wrapper(task):
          if task == task_conversion and num_pages >= number_of_pages:
            print(f'read all {num_pages} pages')
            if not task_rate_limit.done():
              task_rate_limit.cancel('cancel rate limit task as all pages are read')
            return True
          elif task == task_conversion:
            print(f'read already {num_pages-1} pages')
            print('waiting rate limit to refresh ...')
          return True
        return wrapper
      
      async def main():
        results = []
        for p in list(range(1, number_of_pages, num_page_per_rate_limit)):
          num_pages = min(p+num_page_per_rate_limit, number_of_pages)
          task_pdf_converstion = asyncio.create_task(zerox_fn(range(p, num_pages)))
          task_rate_limit = asyncio.create_task(rate_limit())
          callback = wait_for(num_pages, task_pdf_converstion, task_rate_limit)
          task_pdf_converstion.add_done_callback(callback)
          task_rate_limit.add_done_callback(callback)
      
          # https://docs.python.org/3/library/asyncio-task.html#asyncio.gather
          result = await asyncio.gather(
            task_pdf_converstion,
            task_rate_limit,
            # https://docs.python.org/3/library/asyncio-task.html#running-tasks-concurrently
            return_exceptions=True
          )
          results.append(result)
          print(f'work done for the given select_pages: {[p, num_pages]}')
        return [result[0] for result in results]
       
      return await main()
  • snippet.py snippet
    # PDF EXTRACT via zerox
    from pyzerox import zerox
    import pathlib, os, asyncio
    
    # https://ai.google.dev/gemini-api/docs/models/gemini
    model = "gemini/gemini-2.0-flash-exp"
    model = "gemini/gemini-1.5-flash"
    model = "gemini/gemini-1.5-flash-8b"
    
    kwargs = {}
    custom_system_prompt = None
    async def zerox_fn(select_pages=None):
      file_path = "data/input.pdf" ## local filepath and file URL supported
      output_dir = "./data"
      output_dir = None
      result = await zerox(file_path=file_path, model=model, output_dir=output_dir,
                           custom_system_prompt=custom_system_prompt, select_pages=select_pages, **kwargs)
      return result
    
    from wait_for import wait_for_rate_limit
    # https://docs.python.org/3/library/asyncio.html
    # https://docs.python.org/3/library/asyncio-task.html
    results = asyncio.run(wait_for_rate_limit(
      zerox_fn=zerox_fn,
      number_of_pages=12,
      num_page_per_rate_limit=6,
      rate_limit_seconds=60
    ))
    pages = [page for result in results for page in result.pages]
    md_text = "\n".join([page.content for page in pages])
    print(md_text)
    pathlib.Path("data/output-zerox-pdf.md").write_text(md_text)
    print("Markdown saved to output-zerox-pdf.md")
    # HOW TO RUN THIS SCRIPT:
    ({
    export GEMINI_API_KEY="XXXXXXXXXXXXXXXXXXXX"
    export LITELLM_LOG=CRITICAL
    export PYTHONWARNINGS=ignore
    python3 snippet.py
    })
  • output

    image

This snippet snippet.py is quite similar to what's described in the py-zerox README.

If you think this solution looks okay, we might consider creating a utility function within the library to help manage rate limit issues at the application level.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants