Using Llama3.2-vision #106

mark-314e · 2024-11-22T11:31:22Z

Using #64 as reference I was able to run LLama3.2-vision however the output seems completely unrelated. I have not modified the prompt in anyway and am passing a multipage pdf.

Code:

from pyzerox import zerox
import os
import json
import asyncio
import os

os.environ['OLLAMA_API_BASE'] = "http://localhost:11434"
model = "ollama/llama3.2-vision"
pdf_path = "/home/ubuntu/lb38e-5507-9350-c49e6137f377_D0-.2223530.pdf"
output_path = "/home/ubuntu/omniocr/result"
kwargs = {}

custom_system_prompt = None

async def main():
    select_pages = None ## None for all, but could be int or list(int) page numbers (1 indexed)

    result = await zerox(file_path=pdf_path, model=model, output_dir=output_path,
                        custom_system_prompt=custom_system_prompt,select_pages=select_pages,validate_vision_capability=False, **kwargs)
    return result


# run the main function:
result = asyncio.run(main())
print(result)

Output:
ZeroxOutput(completion_time=17628.399, file_name='ec1e22d8_b38e_5507_9350_c49e6137f377_d0__2223530', input_tokens=10240, output_tokens=707, pages=[Page(content='It appears that you\'ve provided a base64-encoded image. I\'ll decode it for you.\n\nHere is the decoded image:\n\n![image](https://i.imgur.com/9Jv6uZ3.png)\n\nPlease note that this is a PNG image, and you can save it to your computer by right-clicking on the link and selecting "Save Image As".', content_length=287, page=1), Page(content="It looks like you've shared an image encoded in Base64 format!\n\nIf I decode and render the image, it appears to be a PNG file containing a simple diagram or graphic. However, without further context or information about what this image represents, I won't be able to provide any additional insights or analysis.\n\nWould you like me to:\n\n1. Provide the decoded image data (as text)?\n2. Try to identify the type of diagram or graphic it represents?\n3. Help with something else related to the image?\n\nLet me know how I can assist!", content_length=526, page=2), Page(content='It looks like you\'ve provided a base64-encoded string, which appears to be an image file. The string is quite long and complex, so I\'ll provide some guidance on how to decode and display it.\n\n**Decoding the string**\n\nTo decode the string, you can use a base64 decoder tool or library in your programming language of choice. Here\'s an example using Python:\npython\nimport base64\n\nencoded_string = """your_encoded_string_here"""\n\ndecoded_bytes = base64.b64decode(encoded_string)\n\nReplace `your_encoded_string_here` with the actual string you provided.\n\n**Displaying the image**\n\nAfter decoding, you should have a bytes object containing the image data. You can then use a library like Pillow (Python Imaging Library) to display the image:\npython\nfrom PIL import Image\nimport io\n\ndecoded_bytes = base64.b64decode(encoded_string)\nimage_data = io.BytesIO(decoded_bytes)\n\nimg = Image.open(image_data)\nimg.show()\n\nThis will display the image in your default image viewer.\n\n**Note**: The encoded string appears to be a PNG image, but you may need to adjust the decoding and displaying code depending on the actual file format.', content_length=1129, page=3), Page(content='It looks like you\'ve provided a PNG image encoded as a Base64 string. Here\'s how to extract and display the image:\n\n**Python:**\n\npython\nimport base64\n\n# Load the Base64 string from your input\nimage_data = 'your_base64_string_here'\n\n# Decode the Base64 string\ndecoded_image_data = base64.b64decode(image_data)\n\n# Save the decoded data as a PNG file (replace "output.png" with your desired filename)\nwith open('output.png', 'wb') as f:\n f.write(decoded_image_data)\n\n\n**Online Tool:**\n\nYou can also use an online Base64 decoder tool to extract and view the image. Simply paste the Base64 string into the tool, select PNG as the output format, and click "Decode". The decoded image will be displayed.\n\nOnce you\'ve extracted the image, you should see a PNG file with the original data.', content_length=789, page=4), Page(content="It looks like you've provided a PNG image encoded in Base64 format. Here's the decoded version:\n\nUnfortunately, I don't see any text or meaningful information within this image. It appears to be an abstract art piece with various shapes and colors.\n\nIf you're trying to extract some specific data from this image, please provide more context about what you're looking for (e.g., a logo, a message, or something else).", content_length=417, page=5)])

The text was updated successfully, but these errors were encountered:

tylermaran · 2024-11-23T02:59:50Z

Interesting. It seems like it's not encoded correctly for llama 3.2.

I'm planning to add the llama 3.2 option to the node package as well, so I'll take a look at both implementations and see what the issue is. It'll be a bit different in the python version since that's going through litellm.

lilei2894 · 2024-11-25T07:29:06Z

i‘m also trying this and encountered a similar problem. My error output was:

zeroxOutput(completion_time=295755.386, file_name='cs101', input_tokens=153445, output_tokens=206, pages=[Page(content="It looks like you're trying to paste a Base64-encoded image. However, the code snippet you provided is incomplete and contains some errors.\n\nHere's a cleaned-up version of your code:\n```python\nimport base64\nfrom PIL import Image\nfrom io import BytesIO\n\n# Your encoded image data\nencoded_image = b'... (rest of the encoded image data here)'\n\n# Decode the Base64-encoded image data\ndecoded_image_data = base64.b64decode(encoded_image)\n\n# Create a Pillow Image object from the decoded image data\nimage = Image.open(BytesIO(decoded_image_data))\n\n# Display the image (optional)\nimage.show()\n```\nIf you're trying to display the image, make sure you have the Pillow library installed (pip install pillow) and that the image is in a format that can be displayed by your system.\n\nPlease note that the encoded image data you provided seems to be incomplete. If you provide the complete encoded image data, I'd be happy to help further!", content_length=929, page=1)])

hule369 · 2024-12-02T08:58:34Z

I definitely would like to see llama 3.2 support!

James4Ever0 · 2024-12-26T06:36:31Z

Currently I do not see a parameter called validate_vision_capability, therefore I do this:

import litellm

litellm.supports_vision = lambda *args, **kwargs: True
litellm.check_valid_key = lambda *args, **kwargs: True

Also LiteLLM has bug with vision Ollama models. Fixed here: BerriAI/litellm#6683

This post is really helpful. Hope to see official support in Zerox.

Matrix-X · 2024-12-31T05:39:31Z

Here is the code I tried to use llama3.2-vision by Ollama

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result

But I have got the output below:

❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
result = asyncio.run(main())
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
result = await zerox(
^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
vision_model = litellmmodel(model=model,**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
self.validate_model()
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
raise NotAVisionModel(extra_info={"model": self.model})
pyzerox.errors.exceptions.NotAVisionModel:
The provided model is not a vision model. Please provide a vision model.
(Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

James4Ever0 · 2024-12-31T07:19:54Z

Here is the code I tried to use llama3.2-vision by Ollama
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result
But I have got the output below:

❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
result = asyncio.run(main())
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
result = await zerox(
^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
vision_model = litellmmodel(model=model,**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
self.validate_model()
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
raise NotAVisionModel(extra_info={"model": self.model})
pyzerox.errors.exceptions.NotAVisionModel:
The provided model is not a vision model. Please provide a vision model.
(Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!

Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

Matrix-X · 2024-12-31T12:44:26Z

Here is the code I tried to use llama3.2-vision by Ollama
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result
But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
result = asyncio.run(main())
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
result = await zerox(
^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
vision_model = litellmmodel(model=model,**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
self.validate_model()
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
raise NotAVisionModel(extra_info={"model": self.model})
pyzerox.errors.exceptions.NotAVisionModel:
The provided model is not a vision model. Please provide a vision model.
(Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!
Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.

yep, after trying your code , I can use the local Ollama model to run . but here is the response :

James4Ever0 · 2025-01-01T06:03:32Z

Here is the code I tried to use llama3.2-vision by Ollama
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result
But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
result = asyncio.run(main())
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
result = await zerox(
^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
vision_model = litellmmodel(model=model,**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
self.validate_model()
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
raise NotAVisionModel(extra_info={"model": self.model})
pyzerox.errors.exceptions.NotAVisionModel:
The provided model is not a vision model. Please provide a vision model.
(Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!
Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.
yep, after trying your code , I can use the local Ollama model to run . but here is the response :

There is an additional patch for LiteLLM in my comment. Please also try that.

Matrix-X · 2025-01-02T02:56:40Z

Here is the code I tried to use llama3.2-vision by Ollama
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"

file_path = (
      "/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
  )

## process only some pages or all
select_pages = (
    # 1,2,3,4,5  ## None for all, but could be int or list(int) page numbers (1 indexed)
    # 1
    None
)

output_dir = Path(
    "./data/pdf/out_parse_pdf/result"
)  ## directory to save the consolidated markdown file
temp_dir = Path(
    "./data/pdf/out_parse_pdf/temp"
)  ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
    file_path=file_path,
    model=model,
    output_dir=output_dir,
    custom_system_prompt=custom_system_prompt,
    select_pages=select_pages,
    # tempDir=temp_dir,
    **kwargs
)
return result
But I have got the output below:
❯ python ./playground/ocr/zerox/parse_pdf.py
/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Traceback (most recent call last):
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 115, in
result = asyncio.run(main())
^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Volumes/AISpace/Workspace/projects/qa_robot/./playground/ocr/zerox/parse_pdf.py", line 102, in main
result = await zerox(
^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/core/zerox.py", line 76, in zerox
vision_model = litellmmodel(model=model,**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 38, in init
self.validate_model()
File "/opt/anaconda3/envs/llama/lib/python3.12/site-packages/pyzerox/models/modellitellm.py", line 66, in validate_model
raise NotAVisionModel(extra_info={"model": self.model})
pyzerox.errors.exceptions.NotAVisionModel:
The provided model is not a vision model. Please provide a vision model.
(Extra Info: {'model': 'ollama/llama3.2-vision:latest'})

Pls help me!
Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up.
yep, after trying your code , I can use the local Ollama model to run . but here is the response :
There is an additional patch for LiteLLM in my comment. Please also try that.

thx, it works with llama-version by Ollama locally.

pls also help check another issue here: #131

I had used the gpt-4o-mini and llama-vision:90b, both parsing the 90% info , except the "name" field.

if the issue occurs in both gpt-4o-mini and llama-vision:90b models , I think it should be a tricky parameter in the zeorx.

cainmagi mentioned this issue Dec 18, 2024

Local LLM #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Llama3.2-vision #106

Using Llama3.2-vision #106

mark-314e commented Nov 22, 2024 •

edited

Loading

tylermaran commented Nov 23, 2024

lilei2894 commented Nov 25, 2024

hule369 commented Dec 2, 2024

James4Ever0 commented Dec 26, 2024 •

edited

Loading

Matrix-X commented Dec 31, 2024

James4Ever0 commented Dec 31, 2024 •

edited

Loading

Matrix-X commented Dec 31, 2024

James4Ever0 commented Jan 1, 2025 •

edited

Loading

Matrix-X commented Jan 2, 2025

Using Llama3.2-vision #106

Using Llama3.2-vision #106

Comments

mark-314e commented Nov 22, 2024 • edited Loading

tylermaran commented Nov 23, 2024

lilei2894 commented Nov 25, 2024

hule369 commented Dec 2, 2024

James4Ever0 commented Dec 26, 2024 • edited Loading

Matrix-X commented Dec 31, 2024

James4Ever0 commented Dec 31, 2024 • edited Loading

Matrix-X commented Dec 31, 2024

James4Ever0 commented Jan 1, 2025 • edited Loading

Matrix-X commented Jan 2, 2025

mark-314e commented Nov 22, 2024 •

edited

Loading

James4Ever0 commented Dec 26, 2024 •

edited

Loading

James4Ever0 commented Dec 31, 2024 •

edited

Loading

James4Ever0 commented Jan 1, 2025 •

edited

Loading