This project provides a Python script that reads images from a specified folder, uses the llava
model from the Ollama API to generate descriptions for each image, and saves these descriptions in a corresponding .txt
file. This script is ideal for developers and researchers working with image datasets who need to generate textual descriptions automatically.
- Automatic Image Description: The script uses the
llava
model to describe images. - Batch Processing: Processes all images ('.png', '.jpg', '.jpeg') in the folder.
- Output to Text Files: Saves descriptions in
.txt
files with the same names as the corresponding images.
- The script converts images to base64 encoding.
- It sends the base64 image data to the Ollama API, specifying the
llava
model. - The API returns a description of the image, which the script saves in a
.txt
file.
- Python 3.9+
- Pip (Python package installer)
- Ollama API running locally (default at
http://localhost:11434
)
-
Clone the repository:
git clone https://github.com/agarzon/ollama-image-caption.git cd ollama-image-caption
-
Create and activate a virtual environment:
- On Windows:
python -m venv venv venv\Scripts\activate
- On macOS/Linux:
python3 -m venv venv source venv/bin/activate
- On Windows:
-
Install the required packages:
pip install -r requirements.txt
-
Put all the images you want to process in the
images
folder. -
Activate the virtual environment:
source myenv/bin/activate
-
Run the script:
python script_name.py
Replace
script_name.py
with the name of your script. -
Output:
- The script processes each image in the specified folder and generates a
.txt
file with the description.
- The script processes each image in the specified folder and generates a
If you have an image named example.jpg
, the script will generate a description and save it in example.txt
in the same folder.
This project is licensed under the MIT License. See the LICENSE file for details.
Disclaimer: This project is for educational and research purposes. Make sure to comply with the terms and conditions of the Ollama API and any other third-party services used.