Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Llama3.2-vision #106

Open
mark-314e opened this issue Nov 22, 2024 · 9 comments
Open

Using Llama3.2-vision #106

mark-314e opened this issue Nov 22, 2024 · 9 comments

Comments

@mark-314e
Copy link

mark-314e commented Nov 22, 2024

Using #64 as reference I was able to run LLama3.2-vision however the output seems completely unrelated. I have not modified the prompt in anyway and am passing a multipage pdf.

Code:

from pyzerox import zerox
import os
import json
import asyncio
import os

os.environ['OLLAMA_API_BASE'] = "http://localhost:11434"
model = "ollama/llama3.2-vision"
pdf_path = "/home/ubuntu/lb38e-5507-9350-c49e6137f377_D0-.2223530.pdf"
output_path = "/home/ubuntu/omniocr/result"
kwargs = {}

custom_system_prompt = None

async def main():
    select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)

    result = await zerox(file_path=pdf_path, model=model, output_dir=output_path,
                        custom_system_prompt=custom_system_prompt,select_pages=select_pages,validate_vision_capability=False, **kwargs)
    return result


# run the main function:
result = asyncio.run(main())
print(result)

Output:
ZeroxOutput(completion_time=17628.399, file_name='ec1e22d8_b38e_5507_9350_c49e6137f377_d0__2223530', input_tokens=10240, output_tokens=707, pages=[Page(content='It appears that you\'ve provided a base64-encoded image. I\'ll decode it for you.\n\nHere is the decoded image:\n\n![image](https://i.imgur.com/9Jv6uZ3.png)\n\nPlease note that this is a PNG image, and you can save it to your computer by right-clicking on the link and selecting "Save Image As".', content_length=287, page=1), Page(content="It looks like you've shared an image encoded in Base64 format!\n\nIf I decode and render the image, it appears to be a PNG file containing a simple diagram or graphic. However, without further context or information about what this image represents, I won't be able to provide any additional insights or analysis.\n\nWould you like me to:\n\n1. Provide the decoded image data (as text)?\n2. Try to identify the type of diagram or graphic it represents?\n3. Help with something else related to the image?\n\nLet me know how I can assist!", content_length=526, page=2), Page(content='It looks like you\'ve provided a base64-encoded string, which appears to be an image file. The string is quite long and complex, so I\'ll provide some guidance on how to decode and display it.\n\n**Decoding the string**\n\nTo decode the string, you can use a base64 decoder tool or library in your programming language of choice. Here\'s an example using Python:\npython\nimport base64\n\nencoded_string = """your_encoded_string_here"""\n\ndecoded_bytes = base64.b64decode(encoded_string)\n\nReplace `your_encoded_string_here` with the actual string you provided.\n\n**Displaying the image**\n\nAfter decoding, you should have a bytes object containing the image data. You can then use a library like Pillow (Python Imaging Library) to display the image:\npython\nfrom PIL import Image\nimport io\n\ndecoded_bytes = base64.b64decode(encoded_string)\nimage_data = io.BytesIO(decoded_bytes)\n\nimg = Image.open(image_data)\nimg.show()\n\nThis will display the image in your default image viewer.\n\n**Note**: The encoded string appears to be a PNG image, but you may need to adjust the decoding and displaying code depending on the actual file format.', content_length=1129, page=3), Page(content='It looks like you\'ve provided a PNG image encoded as a Base64 string. Here\'s how to extract and display the image:\n\n**Python:**\n\npython\nimport base64\n\n# Load the Base64 string from your input\nimage_data = 'your_base64_string_here'\n\n# Decode the Base64 string\ndecoded_image_data = base64.b64decode(image_data)\n\n# Save the decoded data as a PNG file (replace "output.png" with your desired filename)\nwith open('output.png', 'wb') as f:\n f.write(decoded_image_data)\n\n\n**Online Tool:**\n\nYou can also use an online Base64 decoder tool to extract and view the image. Simply paste the Base64 string into the tool, select PNG as the output format, and click "Decode". The decoded image will be displayed.\n\nOnce you\'ve extracted the image, you should see a PNG file with the original data.', content_length=789, page=4), Page(content="It looks like you've provided a PNG image encoded in Base64 format. Here's the decoded version:\n\nUnfortunately, I don't see any text or meaningful information within this image. It appears to be an abstract art piece with various shapes and colors.\n\nIf you're trying to extract some specific data from this image, please provide more context about what you're looking for (e.g., a logo, a message, or something else).", content_length=417, page=5)])

@tylermaran
Copy link
Contributor

Interesting. It seems like it's not encoded correctly for llama 3.2.

I'm planning to add the llama 3.2 option to the node package as well, so I'll take a look at both implementations and see what the issue is. It'll be a bit different in the python version since that's going through litellm.

@lilei2894
Copy link

i‘m also trying this and encountered a similar problem. My error output was:

zeroxOutput(completion_time=295755.386, file_name='cs101', input_tokens=153445, output_tokens=206, pages=[Page(content="It looks like you're trying to paste a Base64-encoded image. However, the code snippet you provided is incomplete and contains some errors.\n\nHere's a cleaned-up version of your code:\n```python\nimport base64\nfrom PIL import Image\nfrom io import BytesIO\n\n# Your encoded image data\nencoded_image = b'... (rest of the encoded image data here)'\n\n# Decode the Base64-encoded image data\ndecoded_image_data = base64.b64decode(encoded_image)\n\n# Create a Pillow Image object from the decoded image data\nimage = Image.open(BytesIO(decoded_image_data))\n\n# Display the image (optional)\nimage.show()\n```\nIf you're trying to display the image, make sure you have the Pillow library installed (pip install pillow) and that the image is in a format that can be displayed by your system.\n\nPlease note that the encoded image data you provided seems to be incomplete. If you provide the complete encoded image data, I'd be happy to help further!", content_length=929, page=1)])

@hule369
Copy link

hule369 commented Dec 2, 2024

I definitely would like to see llama 3.2 support!

@cainmagi cainmagi mentioned this issue Dec 18, 2024
@James4Ever0
Copy link

James4Ever0 commented Dec 26, 2024

Currently I do not see a parameter called validate_vision_capability, therefore I do this:

import litellm

litellm.supports_vision = lambda *args, **kwargs: True
litellm.check_valid_key = lambda *args, **kwargs: True

Also LiteLLM has bug with vision Ollama models. Fixed here: BerriAI/litellm#6683

This post is really helpful. Hope to see official support in Zerox.

@Matrix-X
Copy link

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:

❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

  • 'fields' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
    result = asyncio.run(main())
    ^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
    ^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
    ^^^^^^^^^^^^^^^
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
    result = await zerox(
    ^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
    vision_model = litellmmodel(model=model,**kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
    self.validate_model()
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
    raise NotAVisionModel(extra_info={"model": self.model})
    pyzerox.errors.exceptions.NotAVisionModel:
    The provided model is not a vision model. Please provide a vision model.
    (Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

@James4Ever0
Copy link

James4Ever0 commented Dec 31, 2024

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:

❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

  • 'fields' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
    result = asyncio.run(main())
    ^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
    ^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
    ^^^^^^^^^^^^^^^
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
    result = await zerox(
    ^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
    vision_model = litellmmodel(model=model,**kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
    self.validate_model()
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
    raise NotAVisionModel(extra_info={"model": self.model})
    pyzerox.errors.exceptions.NotAVisionModel:
    The provided model is not a vision model. Please provide a vision model.
    (Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

@Matrix-X
Copy link

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

  • 'fields' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
    result = asyncio.run(main())
    ^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
    ^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
    ^^^^^^^^^^^^^^^
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
    result = await zerox(
    ^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
    vision_model = litellmmodel(model=model,**kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
    self.validate_model()
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
    raise NotAVisionModel(extra_info={"model": self.model})
    pyzerox.errors.exceptions.NotAVisionModel:
    The provided model is not a vision model. Please provide a vision model.
    (Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

yep, after trying your code , I can use the local Ollama model to run . but here is the response :

image

@James4Ever0
Copy link

James4Ever0 commented Jan 1, 2025

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

  • 'fields' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
    result = asyncio.run(main())
    ^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
    ^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
    ^^^^^^^^^^^^^^^
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
    result = await zerox(
    ^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
    vision_model = litellmmodel(model=model,**kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
    self.validate_model()
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
    raise NotAVisionModel(extra_info={"model": self.model})
    pyzerox.errors.exceptions.NotAVisionModel:
    The provided model is not a vision model. Please provide a vision model.
    (Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

yep, after trying your code , I can use the local Ollama model to run . but here is the response :

image

There is an additional patch for LiteLLM in my comment. Please also try that.

@Matrix-X
Copy link

Matrix-X commented Jan 2, 2025

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

  • 'fields' has been removed
    warnings.warn(message, UserWarning)
    Traceback (most recent call last):
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
    result = asyncio.run(main())
    ^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
    ^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
    ^^^^^^^^^^^^^^^
    File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
    result = await zerox(
    ^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
    vision_model = litellmmodel(model=model,**kwargs)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
    self.validate_model()
    File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
    raise NotAVisionModel(extra_info={"model": self.model})
    pyzerox.errors.exceptions.NotAVisionModel:
    The provided model is not a vision model. Please provide a vision model.
    (Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

yep, after trying your code , I can use the local Ollama model to run . but here is the response :
image

There is an additional patch for LiteLLM in my comment. Please also try that.

thx, it works with llama-version by Ollama locally.

pls also help check another issue here: #131

I had used the gpt-4o-mini and llama-vision:90b, both parsing the 90% info , except the "name" field.

if the issue occurs in both gpt-4o-mini and llama-vision:90b models , I think it should be a tricky parameter in the zeorx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants