Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Vision models from Groq #65

Open
VedantR3907 opened this issue Oct 21, 2024 · 9 comments
Open

Supporting Vision models from Groq #65

VedantR3907 opened this issue Oct 21, 2024 · 9 comments

Comments

@VedantR3907
Copy link

I tried using Vision models like llama-3.2-90b-vision-preview, llama-3.2-11b-vision-preview, llava-v1.5-7b-4096-preview but it shows same thing as:
image

@pradhyumna85
Copy link
Contributor

@VedantR3907, the issue is due to the way litellm validates if a given model has vision capability or not. So litellm maintains a list of models with their properties and capabilities in a static json and the llama 3.2 models (including the vision models) are not added in it.
https://github.com/BerriAI/litellm/blob/fb523b79e9fdd7ce2d3a33f6c57a3679c7249e35/litellm/utils.py#L4974
https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json

For now you try to uninstall zerox and install from this fork (#40):
pip install git+https://github.com/pradhyumna85/zerox.git@formatting-control

and pass validate_vision_capability=False in the zerox function and see if that solves.

@VedantR3907
Copy link
Author

I tried to pass the parameter, We don't have the parameter passed to the Zerox function:
image

So, There was the same problem with llava models in ollama, So I tried a workaround to see if it works and it foes for ollama llava model as below:
pyzerox\models\modellitellm.py
image

But the same I tried with groq, Which gave me a error from groq i guess:
image

@pradhyumna85
Copy link
Contributor

pradhyumna85 commented Oct 22, 2024

@VedantR3907, there was some issue as all the kwargs were passed to litellm, fixed that, remove and reinstall pyzerox using the same pip command shared eariler, however this time there is a new error:
image
it seems like groq's llama 3.2 vision models don't support system message with messages with images.

Looks like litellm hasn't added support for vision models from groq.

@pradhyumna85
Copy link
Contributor

pradhyumna85 commented Oct 22, 2024

@VedantR3907, in the latest versions of litellm 1.50.1 (we are using a lower version in pyzerox), I can get image prompting to work with llama 3.2 vision but the current implementation in pyzerox backend uses system prompt for instructions which groq backend doesn't support along with image input.
Feel free to fork the repo to adapt the model class in pyzerox\models\modellitellm.py to remove system prompt and providing the instructions in user prompt may be to see if that works, if that goes well then you can raise a PR for the same, we just need to make sure that doesn't break existing models.

@VedantR3907
Copy link
Author

VedantR3907 commented Oct 23, 2024

@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.

I only changed the _prepare_messages function from the modellitellm.py

`async def _prepare_messages(
self,
image_path: str,
maintain_format: bool,
prior_page: str,
) -> List[Dict[str, Any]]:
"""Prepares the messages to send to the LiteLLM Completion API.
:param image_path: Path to the image file.
:type image_path: str
:param maintain_format: Whether to maintain the format from the previous page.
:type maintain_format: bool
:param prior_page: The markdown content of the previous page.
:type prior_page: str
"""
messages: List[Dict[str, Any]] = []

    # Check if the model belongs to Groq family (by checking if model starts with 'groq/')
    if self.model.startswith('groq/'):
        # Prepare the user message that includes system instructions and image
        user_content = []

        # Add system prompt content as text in the user message
        user_content.append(
            {
                "type": "text",
                "text": f"{self._system_prompt}",
            }
        )

        # If maintain_format is true, add prior page formatting
        if maintain_format and prior_page:
            user_content.append(
                {
                    "type": "text",
                    "text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        user_content.append(
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"},
            }
        )

        # Append the user message
        messages.append(
            {
                "role": "user",
                "content": user_content,
            }
        )

    else:
        # Default behavior for non-Groq models
        # Add system prompt as system message
        messages.append(
            {
                "role": "system",
                "content": self._system_prompt,
            }
        )

        # If maintain_format is true, add prior page formatting as a system message
        if maintain_format and prior_page:
            messages.append(
                {
                    "role": "system",
                    "content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ],
            }
        )

    return messages`

@MANOJ21K
Copy link

MANOJ21K commented Nov 2, 2024

@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.

I only changed the _prepare_messages function from the modellitellm.py

`async def _prepare_messages( self, image_path: str, maintain_format: bool, prior_page: str, ) -> List[Dict[str, Any]]: """Prepares the messages to send to the LiteLLM Completion API. :param image_path: Path to the image file. :type image_path: str :param maintain_format: Whether to maintain the format from the previous page. :type maintain_format: bool :param prior_page: The markdown content of the previous page. :type prior_page: str """ messages: List[Dict[str, Any]] = []

    # Check if the model belongs to Groq family (by checking if model starts with 'groq/')
    if self.model.startswith('groq/'):
        # Prepare the user message that includes system instructions and image
        user_content = []

        # Add system prompt content as text in the user message
        user_content.append(
            {
                "type": "text",
                "text": f"{self._system_prompt}",
            }
        )

        # If maintain_format is true, add prior page formatting
        if maintain_format and prior_page:
            user_content.append(
                {
                    "type": "text",
                    "text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        user_content.append(
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/png;base64,{base64_image}"},
            }
        )

        # Append the user message
        messages.append(
            {
                "role": "user",
                "content": user_content,
            }
        )

    else:
        # Default behavior for non-Groq models
        # Add system prompt as system message
        messages.append(
            {
                "role": "system",
                "content": self._system_prompt,
            }
        )

        # If maintain_format is true, add prior page formatting as a system message
        if maintain_format and prior_page:
            messages.append(
                {
                    "role": "system",
                    "content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
                }
            )

        # Add image as part of the user message
        base64_image = await encode_image_to_base64(image_path)
        messages.append(
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                    },
                ],
            }
        )

    return messages`

can you share what all changes you made @VedantR3907

@VedantR3907
Copy link
Author

@MANOJ21K See the code I shared above I passed the system prompt for the GROQ models as user's message, withing in the user_content list. copy paste the code above you will be able to use the system prompt written by @pradhyumna85

@MANOJ21K
Copy link

MANOJ21K commented Nov 3, 2024

@pradhyumna85 can you share the modellitellm.py and py-zerox version

@igorlima
Copy link

I discovered this library through a blog post on Medium. As I explored and gained experience with other libraries and litellm, I felt inspired to contribute to this project.

To address this issue, I suggest being flexible when using litellm. Currently, the py-zerox code has a hardcoded system role, applying the same structure, prompt, and role for all LLMs. However, some LLMs require a bit of customization. The open PR introduces a new custom_role parameter to tackle this. So, setting custom_role=user should resolve the issue.

There is already an open PR addressing this. I haven't seen any problems with litellm or Groq so far. The PR includes a snippet to try out.

  • here's a brief version of the code snippet
    # installing from git repo branch
    pip install --no-cache --upgrade-strategy eager -I \
      git+https://github.com/igorlima/zerox.git@default-role
    # Groq PDF EXTRACT via zerox
    from pyzerox import zerox
    import pathlib, os, asyncio
    
    kwargs = {}
    custom_system_prompt = None
    custom_role = 'user' # HERE IS the NEW custom_role parameter
    
    model = "groq/llama-3.2-11b-vision-preview"
    
    async def main():
      file_path = "data/input.pdf"
      select_pages = None
      output_dir = "./data"
      output_dir = None
      result = await zerox(file_path=file_path, model=model, output_dir=output_dir,
                          custom_system_prompt=custom_system_prompt, select_pages=select_pages,
                          custom_role=custom_role, **kwargs)
      return result
    
    result = asyncio.run(main())
    md_text = "\n".join([page.content for page in result.pages])
    
    print(md_text)
    pathlib.Path("data/output-zerox-pdf.md").write_text(md_text)
    print("Markdown saved to output-zerox-pdf.md")
    # TO RUN this script:
    ({
    export GROQ_API_KEY="XXXXXXXXXXXXXXXXXXXX"
    python3 snippet.py
    })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants