Add support for prepending images to raw prompts and examples #392

rylativity · 2024-12-22T17:13:24Z

This PR is related to a PR opened against ollama server to address unexpected behavior when passing both images and raw=True to a generate() call. The issue is described here - #319, and the open PR against ollama server is here - ollama/ollama#8209.

These changes implement the necessary updates to the python client and generate() methods to allow passing an optional bool prepend_images_to_raw_prompt keyword argument to the generate() call. As optional, the change is intended to preserve (and not break) existing users. See associated PR against ollama server linked above for details about how this kwarg will work.

This PR also includes a new example titled multimodal-raw-generate.py which shows both how to use the proposed prepend_images_to_raw_prompt kwarg as well as how to manually place image tags within a raw prompt without the need for prepend_images_to_raw_prompt.

…ages to raw prompt; also includes examples for how to pass images alongside raw prompts

ParthSareen · 2025-02-12T19:39:55Z

Hey @rylativity,

Thanks for the PR. I've thought a good bit about this and I think that for 99% of the cases, chat just makes more sense as you can easily transform structured JSON data. As you've already found, swapping out the image tag works for your use case but this is something that can possibly be inconsistent across models and possibly change in the near future. For that reason, going to have to close this one out for now.

Really appreciate you going deep into this though, and happy that you found something which fits your use case!

rylativity · 2025-02-13T19:36:43Z

@ParthSareen in the associated PR to ollama server, I used the same image tags that ollama server already uses when handling images in chat mode - see here. Regarding your concern about models using different image tags in the future, these PRs don't add any overhead from that perspective, because the Chat image tags are equally susceptible to those potential changes and would need to be changed anyway.

This is a different issue, but I might propose a tag lookup based on the model ID being used (all current models would map to <image> and others might raise an exception until implemented).

Regarding the issue I opened the PRs against, I would urge you to consider at least approving the changes to the documentation showing how to pass those image tags in a raw prompt and implementing some logic to raise an exception if some unsupported action with raw prompts and images is attempted.

Right now, the behavior is not documented and will fail silently, which isn't good, but I am open to your opinions and happy to take a stab at implement the proposed changes above or any others that you suggest.

ParthSareen · 2025-02-14T00:07:39Z

Sorry - what I meant to get at was that the responsibility of managing changing tags would be on the server end, managed by us and that's what minimizes the effort by the user. But I suppose that in this case it would still be on the server side.

I think since this is a raw mode, it makes sense to not have an extra param and rather allow the user to pass in whatever they want. You are right about having this documented as even though this is a niche use case, it can be useful for certain folks.

Before we add to the documentation, I would also prefer if we could talk a bit more about what the usage and error handling looks like. Appreciate your PR and convo here!

this adds support for adding param to request to automatically add im…

70e6554

…ages to raw prompt; also includes examples for how to pass images alongside raw prompts

rylativity mentioned this pull request Dec 22, 2024

ollama.generate() with multimodal llama3.2-vision does not pass images array when raw=True #319

Closed

ParthSareen closed this Feb 12, 2025

rylativity mentioned this pull request Feb 13, 2025

api: enable passing images in raw mode ollama/ollama#8209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for prepending images to raw prompts and examples #392

Add support for prepending images to raw prompts and examples #392

rylativity commented Dec 22, 2024

ParthSareen commented Feb 12, 2025

rylativity commented Feb 13, 2025

ParthSareen commented Feb 14, 2025 •

edited

Loading

Add support for prepending images to raw prompts and examples #392

Add support for prepending images to raw prompts and examples #392

Conversation

rylativity commented Dec 22, 2024

ParthSareen commented Feb 12, 2025

rylativity commented Feb 13, 2025

ParthSareen commented Feb 14, 2025 • edited Loading

ParthSareen commented Feb 14, 2025 •

edited

Loading