Supporting Vision models from Groq #65

VedantR3907 · 2024-10-21T10:37:59Z

I tried using Vision models like llama-3.2-90b-vision-preview, llama-3.2-11b-vision-preview, llava-v1.5-7b-4096-preview but it shows same thing as:

pradhyumna85 · 2024-10-21T14:09:59Z

@VedantR3907, the issue is due to the way litellm validates if a given model has vision capability or not. So litellm maintains a list of models with their properties and capabilities in a static json and the llama 3.2 models (including the vision models) are not added in it.
https://github.com/BerriAI/litellm/blob/fb523b79e9fdd7ce2d3a33f6c57a3679c7249e35/litellm/utils.py#L4974
https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

For now you try to uninstall zerox and install from this fork (#40):
pip install git+https://github.com/pradhyumna85/zerox.git@formatting-control

and pass validate_vision_capability=False in the zerox function and see if that solves.

VedantR3907 · 2024-10-22T07:59:31Z

I tried to pass the parameter, We don't have the parameter passed to the Zerox function:

So, There was the same problem with llava models in ollama, So I tried a workaround to see if it works and it foes for ollama llava model as below:
pyzerox\models\modellitellm.py

But the same I tried with groq, Which gave me a error from groq i guess:

pradhyumna85 · 2024-10-22T12:40:53Z

@VedantR3907, there was some issue as all the kwargs were passed to litellm, fixed that, remove and reinstall pyzerox using the same pip command shared eariler, however this time there is a new error:

it seems like groq's llama 3.2 vision models don't support system message with messages with images.

Looks like litellm hasn't added support for vision models from groq.

pradhyumna85 · 2024-10-22T16:39:10Z

@VedantR3907, in the latest versions of litellm 1.50.1 (we are using a lower version in pyzerox), I can get image prompting to work with llama 3.2 vision but the current implementation in pyzerox backend uses system prompt for instructions which groq backend doesn't support along with image input.
Feel free to fork the repo to adapt the model class in pyzerox\models\modellitellm.py to remove system prompt and providing the instructions in user prompt may be to see if that works, if that goes well then you can raise a PR for the same, we just need to make sure that doesn't break existing models.

VedantR3907 · 2024-10-23T05:52:07Z

@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.

I only changed the _prepare_messages function from the modellitellm.py

`async def _prepare_messages(
self,
image_path: str,
maintain_format: bool,
prior_page: str,
) -> List[Dict[str, Any]]:
"""Prepares the messages to send to the LiteLLM Completion API.
:param image_path: Path to the image file.
:type image_path: str
:param maintain_format: Whether to maintain the format from the previous page.
:type maintain_format: bool
:param prior_page: The markdown content of the previous page.
:type prior_page: str
"""
messages: List[Dict[str, Any]] = []

    # Check if the model belongs to Groq family (by checking if model starts with 'groq/')
    if self.model.startswith('groq/'):
        # Prepare the user message that includes system instructions and image
        user_content = []

        # Add system prompt content as text in the user message
        user_content.append(
            {
                "type": "text",
                "text": f"{self._system_prompt}",
            }
        )

        # If maintain_format is true, add prior page formatting
        if maintain_format and prior_page:
            user_content.append(
                {
                    "type": "text",
                    "text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        user_content.append(
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"},
            }
        )

        # Append the user message
        messages.append(
            {
                "role": "user",
                "content": user_content,
            }
        )

    else:
        # Default behavior for non-Groq models
        # Add system prompt as system message
        messages.append(
            {
                "role": "system",
                "content": self._system_prompt,
            }
        )

        # If maintain_format is true, add prior page formatting as a system message
        if maintain_format and prior_page:
            messages.append(
                {
                    "role": "system",
                    "content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ],
            }
        )

    return messages`

MANOJ21K · 2024-11-02T07:41:19Z

@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.

I only changed the _prepare_messages function from the modellitellm.py

`async def _prepare_messages( self, image_path: str, maintain_format: bool, prior_page: str, ) -> List[Dict[str, Any]]: """Prepares the messages to send to the LiteLLM Completion API. :param image_path: Path to the image file. :type image_path: str :param maintain_format: Whether to maintain the format from the previous page. :type maintain_format: bool :param prior_page: The markdown content of the previous page. :type prior_page: str """ messages: List[Dict[str, Any]] = []

    # Check if the model belongs to Groq family (by checking if model starts with 'groq/')
    if self.model.startswith('groq/'):
        # Prepare the user message that includes system instructions and image
        user_content = []

        # Add system prompt content as text in the user message
        user_content.append(
            {
                "type": "text",
                "text": f"{self._system_prompt}",
            }
        )

        # If maintain_format is true, add prior page formatting
        if maintain_format and prior_page:
            user_content.append(
                {
                    "type": "text",
                    "text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        user_content.append(
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"},
            }
        )

        # Append the user message
        messages.append(
            {
                "role": "user",
                "content": user_content,
            }
        )

    else:
        # Default behavior for non-Groq models
        # Add system prompt as system message
        messages.append(
            {
                "role": "system",
                "content": self._system_prompt,
            }
        )

        # If maintain_format is true, add prior page formatting as a system message
        if maintain_format and prior_page:
            messages.append(
                {
                    "role": "system",
                    "content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ],
            }
        )

    return messages`

can you share what all changes you made @VedantR3907

VedantR3907 · 2024-11-02T17:37:05Z

@MANOJ21K See the code I shared above I passed the system prompt for the GROQ models as user's message, withing in the user_content list. copy paste the code above you will be able to use the system prompt written by @pradhyumna85

MANOJ21K · 2024-11-03T15:13:08Z

@pradhyumna85 can you share the modellitellm.py and py-zerox version

igorlima · 2024-12-30T18:56:37Z

I discovered this library through a blog post on Medium. As I explored and gained experience with other libraries and litellm, I felt inspired to contribute to this project.

To address this issue, I suggest being flexible when using litellm. Currently, the py-zerox code has a hardcoded system role, applying the same structure, prompt, and role for all LLMs. However, some LLMs require a bit of customization. The open PR introduces a new custom_role parameter to tackle this. So, setting custom_role=user should resolve the issue.

Introduce a custom system role #130

There is already an open PR addressing this. I haven't seen any problems with litellm or Groq so far. The PR includes a snippet to try out.

here's a brief version of the code snippet

# installing from git repo branch
pip install --no-cache --upgrade-strategy eager -I \
  git+https://github.com/igorlima/zerox.git@default-role

# Groq PDF EXTRACT via zerox
from pyzerox import zerox
import pathlib, os, asyncio

kwargs = {}
custom_system_prompt = None
custom_role = 'user' # HERE IS the NEW custom_role parameter

model = "groq/llama-3.2-11b-vision-preview"

async def main():
  file_path = "data/input.pdf"
  select_pages = None
  output_dir = "./data"
  output_dir = None
  result = await zerox(file_path=file_path, model=model, output_dir=output_dir,
                      custom_system_prompt=custom_system_prompt, select_pages=select_pages,
                      custom_role=custom_role, **kwargs)
  return result

result = asyncio.run(main())
md_text = "\n".join([page.content for page in result.pages])

print(md_text)
pathlib.Path("data/output-zerox-pdf.md").write_text(md_text)
print("Markdown saved to output-zerox-pdf.md")

# TO RUN this script:
({
export GROQ_API_KEY="XXXXXXXXXXXXXXXXXXXX"
python3 snippet.py
})

pradhyumna85 mentioned this issue Oct 23, 2024

local llm #64

Open

pradhyumna85 mentioned this issue Nov 4, 2024

[Bug]: New llama 3.2 vision models from groq not working: prompting with images is incompatible with system messages BerriAI/litellm#6569

Open

igorlima mentioned this issue Dec 30, 2024

Introduce a custom system role #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting Vision models from Groq #65

Supporting Vision models from Groq #65

VedantR3907 commented Oct 21, 2024

pradhyumna85 commented Oct 21, 2024

VedantR3907 commented Oct 22, 2024

pradhyumna85 commented Oct 22, 2024 •

edited

Loading

pradhyumna85 commented Oct 22, 2024 •

edited

Loading

VedantR3907 commented Oct 23, 2024 •

edited

Loading

MANOJ21K commented Nov 2, 2024

VedantR3907 commented Nov 2, 2024

MANOJ21K commented Nov 3, 2024

igorlima commented Dec 30, 2024

Supporting Vision models from Groq #65

Supporting Vision models from Groq #65

Comments

VedantR3907 commented Oct 21, 2024

pradhyumna85 commented Oct 21, 2024

VedantR3907 commented Oct 22, 2024

pradhyumna85 commented Oct 22, 2024 • edited Loading

pradhyumna85 commented Oct 22, 2024 • edited Loading

VedantR3907 commented Oct 23, 2024 • edited Loading

MANOJ21K commented Nov 2, 2024

VedantR3907 commented Nov 2, 2024

MANOJ21K commented Nov 3, 2024

igorlima commented Dec 30, 2024

pradhyumna85 commented Oct 22, 2024 •

edited

Loading

pradhyumna85 commented Oct 22, 2024 •

edited

Loading

VedantR3907 commented Oct 23, 2024 •

edited

Loading