Skip to content

Expand core api to allow for reading images #1

Open
@bdougie

Description

@bdougie

I am working on an agent using the api - https://github.com/bdougie/vision

A lot of the work in my agent is preparing a video for using the vision model to return a description. Today the api doesn't quite send the image to ollama for processing.

https://github.com/bdougie/vision/blob/51fb3a17f0b7e9273798c05f86ca435aa575d109/main.go#L41-L59

func analyzeImage(ctx context.Context, a *agent.DefaultAgent, imagePath string) (string, error) {
	imageData, err := os.ReadFile(imagePath)
	if (err != nil) {
		return "", err
	}


	// Create vision prompt with image data
	prompt := fmt.Sprintf(`[
		{"type": "text", "text": "Describe this image in detail."},
		{"type": "image", "source": {"data": "%s", "media_type": "image/jpeg"}}
	]`, imageData)


	response, err := a.Run(ctx, prompt, agent.DefaultStopCondition)
	if err != nil {
		return "", err
	}

        // this line does not return what I need today. 
	return response[0].Message.Content, nil
}

I may be wrong on this as I am still figuring it out, but the ollama way locally, I can simply to do this:

ollama run llama3.2-vision:11b
>>> describe the image at this location /./frame_0003.jpg

What would I like to see?

Perhaps a tool that is ready to manage an image for the model.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions