Ollama image caption tool

This project provides a Python script that reads images from a specified folder, uses the llava model from the Ollama API to generate descriptions for each image, and saves these descriptions in a corresponding .txt file. This script is ideal for developers and researchers working with image datasets who need to generate textual descriptions automatically.

Features

Automatic Image Description: The script uses the llava model to describe images.
Batch Processing: Processes all images ('.png', '.jpg', '.jpeg') in the folder.
Output to Text Files: Saves descriptions in .txt files with the same names as the corresponding images.

How It Works

The script converts images to base64 encoding.
It sends the base64 image data to the Ollama API, specifying the llava model.
The API returns a description of the image, which the script saves in a .txt file.

Installation

Prerequisites

Python 3.9+
Pip (Python package installer)
Ollama API running locally (default at http://localhost:11434)

Setup

Clone the repository:

git clone https://github.com/agarzon/ollama-image-caption.git
cd ollama-image-caption

Create and activate a virtual environment:

On Windows:

python -m venv venv
venv\Scripts\activate

On macOS/Linux:

python3 -m venv venv
source venv/bin/activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Put all the images you want to process in the images folder.
Activate the virtual environment:
```
source myenv/bin/activate
```
Run the script:
```
python script_name.py
```
Replace script_name.py with the name of your script.
Output:
- The script processes each image in the specified folder and generates a .txt file with the description.

Example

If you have an image named example.jpg, the script will generate a description and save it in example.txt in the same folder.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Disclaimer: This project is for educational and research purposes. Make sure to comply with the terms and conditions of the Ollama API and any other third-party services used.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasetmaker.py		datasetmaker.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ollama image caption tool

Features

How It Works

Installation

Prerequisites

Setup

Usage

Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

agarzon/ollama-image-caption

Folders and files

Latest commit

History

Repository files navigation

Ollama image caption tool

Features

How It Works

Installation

Prerequisites

Setup

Usage

Example

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages