Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance README :: Add Cost-Effective PDF to Markdown Conversion Using AWS Bedrock Models #129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

igorlima
Copy link

@igorlima igorlima commented Dec 29, 2024

This PR introduces another model option for converting PDFs to Markdown using vision models via the zerox library.

  • here's a code snippet for you to give it a go.
    • snippet
      # PDF EXTRACT via zerox
      from pyzerox import zerox
      import pathlib, os, asyncio
      
      import litellm
      # # https://github.com/BerriAI/litellm/blob/11932d0576a073d83f38a418cbdf6b2d8d4ff46f/litellm/litellm_core_utils/get_llm_provider_logic.py#L322
      litellm.suppress_debug_info = True
      # https://docs.litellm.ai/docs/debugging/local_debugging#set-verbose
      litellm.set_verbose=False
      # https://docs.python.org/3/library/logging.html#logging-levels
      # https://github.com/BerriAI/litellm/blob/ea8f0913c2aac4a7b4ec53585cfe9ea04a3282de/litellm/_logging.py#L11
      os.environ['LITELLM_LOG'] = 'CRITICAL'
      
      kwargs = {}
      custom_system_prompt = None
      
      # AWS Bedrock requires the library `boto3`
      # !pip3 install boto3
      model = "bedrock/amazon.nova-lite-v1:0"
      
      # Define main async entrypoint
      async def main():
        file_path = "data/input.pdf" ## local filepath and file URL supported
        ## process only some pages or all
        select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)
        output_dir = "./data" ## directory to save the consolidated markdown file
        output_dir = None
        result = await zerox(file_path=file_path, model=model, output_dir=output_dir,
                            custom_system_prompt=custom_system_prompt, select_pages=select_pages, **kwargs)
        return result
      
      # run the main function:
      result = asyncio.run(main())
      md_text = "\n".join([page.content for page in result.pages])
      
      # print markdown md_text
      print(md_text)
      # save markdown to file
      pathlib.Path("data/output-zerox-pdf.md").write_text(md_text)
      print("Markdown saved to output-zerox-pdf.md")
      
      """
      HOW TO RUN THIS SCRIPT:
      AWS Bedrock requires the library `boto3`: `pip3 install boto3`
      
      ({
      export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
      export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      export AWS_REGION_NAME=us-east-1
      export LITELLM_LOG=CRITICAL
      python3 snippet.py
      })
      """
      # HOW TO RUN THIS SCRIPT:
      # AWS Bedrock requires the library `boto3`: `pip3 install boto3`
      export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX
      export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      export AWS_REGION_NAME=us-east-1
      export LITELLM_LOG=CRITICAL
      python3 snippet.py

What makes this so cool? The zerox library leverages the litellm library behind the scenes, giving us a wider range of options. Plus, litellm lets us tap into vision models from the AWS Bedrock platform and a few other choices.

I opened this PR for two main reasons:

  1. To offer a cost-effective solution.
  2. To provide a model with a higher RPM (requests per minute).

If you're working with a PDF over 10 pages, you might hit Google's rate limit for the Gemini model under the free tier.

Here's a quick comparison of some models:

Model Name Input Cost / Token Output Cost / Token
"amazon.nova-lite-v1:0" 0.000000060 0.00000024
"gemini/gemini-1.5-flash" 0.000000075 0.00000030
"gemini/gemini-1.5-flash-latest" 0.000000075 0.00000030
"gpt-4o-mini" 0.000000150 0.00000060
"azure/gpt-4o-mini" 0.000000165 0.00000066
"groq/llama-3.2-11b-vision-preview" 0.000000180 0.00000018
"anthropic.claude-3-haiku-20240307-v1:0" 0.000000250 0.00000125
"amazon.nova-pro-v1:0" 0.000000800 0.00000320
"claude-3-opus-20240229" 0.000015000 0.00007500

The Amazon model is super competitive in pricing, making it a breeze to use with AWS Lambda functions.

By adding more options right in the README, we're making it easier for newcomers to find what they need.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant