Open
Description
I am working on an agent using the api - https://github.com/bdougie/vision
A lot of the work in my agent is preparing a video for using the vision model to return a description. Today the api doesn't quite send the image to ollama for processing.
https://github.com/bdougie/vision/blob/51fb3a17f0b7e9273798c05f86ca435aa575d109/main.go#L41-L59
func analyzeImage(ctx context.Context, a *agent.DefaultAgent, imagePath string) (string, error) {
imageData, err := os.ReadFile(imagePath)
if (err != nil) {
return "", err
}
// Create vision prompt with image data
prompt := fmt.Sprintf(`[
{"type": "text", "text": "Describe this image in detail."},
{"type": "image", "source": {"data": "%s", "media_type": "image/jpeg"}}
]`, imageData)
response, err := a.Run(ctx, prompt, agent.DefaultStopCondition)
if err != nil {
return "", err
}
// this line does not return what I need today.
return response[0].Message.Content, nil
}
I may be wrong on this as I am still figuring it out, but the ollama way locally, I can simply to do this:
ollama run llama3.2-vision:11b
>>> describe the image at this location /./frame_0003.jpg
What would I like to see?
Perhaps a tool that is ready to manage an image for the model.
Metadata
Metadata
Assignees
Labels
No labels