iu

"iu" is an experiment that started with this tweet.

The goal is to do research around a tool to index and searching your image collection.

"iu" is not intended for productive use, and perhaps will never be.

The name comes from "mu", which is a mail indexer that inspired this project. "mu" means maildir utils, so I guess "iu" means "image utils".

What should "iu" be?

Installation

For now the only method is building from source.

Building from source

You need:

Xapian
libexif
OpenCV
Other dependencies like csv2, spdlog, fmt and CLI11 will be downloaded by the build system.

Once you satisfy those requirements

cmake -S . -B build
cmake --build build

or

$ cd build
cmake ..
make

Running

Getting data files

cmake --build build --target data

or..

$ cd build
$ make data

Indexing images

$ cd build
$ src/iu index --root ~/Pictures
...
indexed: 15465 files

You can use the --ai and other flags to extract and index additional information from the image.

Search

$ cd build
$ src/iu find "camera:powershot"
8725 result found
0: docid /home/foo/1.jpg
...
real    0m0.013s
user    0m0.008s
sys     0m0.005s

Configuration

Every option accepted by a command can be configured in ~/.config/iu.conf. For example for AI options:

[index]

ai-base-url = http://localhost:11434/v1/
ai-model = "minicpm-v"
ai-api_key = XXXXXXXXXXXXXX

Performance

Without many optimizations, I can index 15k files (50G) in 2.7s on a old X230 laptop with SSD (libexif backend).
Adding offline geolocation over 121k places brings that up to 16s.

Design and Implementation Notes

Technologies

Indexing is built on top of Xapian, a free and open-source probabilistic information retrieval library.

The idea of using [SQLite] was considered too.

Metadata from photos is retrieved using libexif.

exiv2 was tested and while the API and format coverage was wider, it was much slower.
Examination of images is done with the help of Open Computer Vision Library.

Caching

This is not implemented yet. Ideally, we should cache data that is expensive to compute. This could be done by implementing a cache based on the file checksum.

Writing data back to the images

This is not implemented yet. Ideally, we could write some data we extract into the files themselves. This would allow eg. to preserve AI keywords when uploading the files to the cloud (e.g. Nextcloud).

However, the standards (EXIF, IPTC, XMP) and different terminologies (Tag,Label,Subject, Keyword, Category) does not make it straightforward (see #891)

Reverse geocoding index

Uses data from reverse_geocode, which is turn, comes from geonames.org. CC-By licence.

It is a dumb search by distance and it is not optimized yet.

Right now the technique is that we convert the photo location into a label (place name) and add this name to the index as a term. Therefore the place is passed into the query.

An alternative approach I am exploring is to allow to pass the place as part of a command line, separate from the query, and use Xapian geospatial (ie. LatLongDistancePostingSource), adding this posting source to the query object.

I will start this exploration by adding the location as a value to the document.

Automatic labeling

OpenCV with model

Uses Berkeley Vision and Learning Center Caffe GoogleNet model, and the word list from ImageNet.

I would still like to allow to drop models and labels list in a directory and have the indexer pick it up automatically.

AI

I called this feature "AI" as everyone uses it for this, but perhaps there is a better name.

Vision capable LLMs are perfect for improving the indexing capabilities, because we can ask them to describe photos with much more level of detail and there are models continuously improving these capabilities.

AI allows extracting image keywords for indexing using an openAI compatible end-point and it is supposed to be the replacement for the GoogleNet model.

Having iu support the openAI API means that we can support local models via ollama, avoiding having to implement model and GPU management and execution inside iu. The user will be able to download, update and run models via ollama. At the same time, iu can be configured to use an external service like openAI or Mistral Pixtral, which provide better models at much higher speeds.

AI can be configured in the command line, or in $HOME/.config/iu.conf:

[index]

ai-base-url = http://localhost:11434/v1/
ai-model = "minicpm-v"
ai-api_key = XXXXXXXXXXXXXX

Quality classification

Uses the BRISQUE (Blind/Referenceless Image Spatial Quality Evaluator), a No Reference Image Quality Assessment (NR-IQA) algorithm as in implemented in OpenCV contrib.

We use the trained model provided in the /samples/ directory, trained on the LIVE-R2 database as in the original implementation.

Right now we don't do anything with this except of adding the word "blurry" to the index. In theory I should add this as a value.

Browsing photos

Right now if you add "-b" (browse) to a search, it will pass the list of files in the result to eog. This does not work well, as there is a limit on the number of files, and if there are no results, eog will still show other files. I am looking for a good replacement.

Hopefully I don't need to write my own.

License

(C)2020 Duncan Mac-Vicar P.
"iu" is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
"iu" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github/workflows		.github/workflows
data		data
ext		ext
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
COPYING.GPLv2		COPYING.GPLv2
COPYING.GPLv3		COPYING.GPLv3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

iu

What should "iu" be?

Installation

Building from source

Running

Getting data files

Indexing images

Search

Configuration

Performance

Design and Implementation Notes

Technologies

Caching

Writing data back to the images

Reverse geocoding index

Automatic labeling

OpenCV with model

AI

Quality classification

Browsing photos

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

dmacvicar/iu

Folders and files

Latest commit

History

Repository files navigation

iu

What should "iu" be?

Installation

Building from source

Running

Getting data files

Indexing images

Search

Configuration

Performance

Design and Implementation Notes

Technologies

Caching

Writing data back to the images

Reverse geocoding index

Automatic labeling

OpenCV with model

AI

Quality classification

Browsing photos

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages