You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A lot of the work in my agent is preparing a video for using the vision model to return a description. Today the api doesn't quite send the image to ollama for processing.
Currently core doesn't really support images although they exist on the "image" type:
// Message represents a single message in a conversation with multimodal supporttypeMessagestruct {
// etc. etc.// A list of base64-encoded images (for multimodal models such as llava// or llama3.2-vision)Images []string
}
It should be as simple as getting the base64 encodings for the images and passing it to the agent. Something like:
func (a*Agent) WithImages(base64Imgstring)
A few things to think through:
Do we expect that an end user would already have the images base64 encoded? Or should that be something that agent-api does for you?
I am working on an agent using the api - https://github.com/bdougie/vision
A lot of the work in my agent is preparing a video for using the vision model to return a description. Today the api doesn't quite send the image to ollama for processing.
https://github.com/bdougie/vision/blob/51fb3a17f0b7e9273798c05f86ca435aa575d109/main.go#L41-L59
I may be wrong on this as I am still figuring it out, but the ollama way locally, I can simply to do this:
What would I like to see?
Perhaps a tool that is ready to manage an image for the model.
The text was updated successfully, but these errors were encountered: