Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for prepending images to raw prompts and examples #392

Closed
wants to merge 1 commit into from

Conversation

rylativity
Copy link

This PR is related to a PR opened against ollama server to address unexpected behavior when passing both images and raw=True to a generate() call. The issue is described here - #319, and the open PR against ollama server is here - ollama/ollama#8209.

These changes implement the necessary updates to the python client and generate() methods to allow passing an optional bool prepend_images_to_raw_prompt keyword argument to the generate() call. As optional, the change is intended to preserve (and not break) existing users. See associated PR against ollama server linked above for details about how this kwarg will work.

This PR also includes a new example titled multimodal-raw-generate.py which shows both how to use the proposed prepend_images_to_raw_prompt kwarg as well as how to manually place image tags within a raw prompt without the need for prepend_images_to_raw_prompt.

…ages to raw prompt; also includes examples for how to pass images alongside raw prompts
@ParthSareen
Copy link
Contributor

Hey @rylativity,

Thanks for the PR. I've thought a good bit about this and I think that for 99% of the cases, chat just makes more sense as you can easily transform structured JSON data. As you've already found, swapping out the image tag works for your use case but this is something that can possibly be inconsistent across models and possibly change in the near future. For that reason, going to have to close this one out for now.

Really appreciate you going deep into this though, and happy that you found something which fits your use case!

@rylativity
Copy link
Author

@ParthSareen in the associated PR to ollama server, I used the same image tags that ollama server already uses when handling images in chat mode - see here. Regarding your concern about models using different image tags in the future, these PRs don't add any overhead from that perspective, because the Chat image tags are equally susceptible to those potential changes and would need to be changed anyway.

This is a different issue, but I might propose a tag lookup based on the model ID being used (all current models would map to <image> and others might raise an exception until implemented).

Regarding the issue I opened the PRs against, I would urge you to consider at least approving the changes to the documentation showing how to pass those image tags in a raw prompt and implementing some logic to raise an exception if some unsupported action with raw prompts and images is attempted.

Right now, the behavior is not documented and will fail silently, which isn't good, but I am open to your opinions and happy to take a stab at implement the proposed changes above or any others that you suggest.

@ParthSareen
Copy link
Contributor

ParthSareen commented Feb 14, 2025

Sorry - what I meant to get at was that the responsibility of managing changing tags would be on the server end, managed by us and that's what minimizes the effort by the user. But I suppose that in this case it would still be on the server side.

I think since this is a raw mode, it makes sense to not have an extra param and rather allow the user to pass in whatever they want. You are right about having this documented as even though this is a niche use case, it can be useful for certain folks.

Before we add to the documentation, I would also prefer if we could talk a bit more about what the usage and error handling looks like. Appreciate your PR and convo here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants