Skip to content

brunorosilva/img2art-search

Repository files navigation

title emoji colorFrom colorTo sdk app_port
Img2Art Search
🐳
purple
gray
docker
7860

Image-to-Art Search 🔍

Demo Link hosted on Hugging Face

"Find real artwork that looks like your images"

This project fine-tunes a Vision Transformer (ViT) model, pre-trained with "google/vit-base-patch32-224-in21k" weights and fine tuned with an image to artwork dataset, to perform image-to-art search across 81k artworks made available by WikiArt.

beach

Table of Contents

Overview

This project leverages the Vision Transformer (ViT) model architecture for the task of image-to-art search. By fine-tuning the pre-trained ViT model on a custom images to artworks dataset, we aim to create a model capable of matching images (but not only) to corresponding artworks, being able to search for any of the images on WikiArt.

Installation

  1. Clone the repository:
git clone https://github.com/brunorosilva/img2art-search.git
cd img2art-search
  1. Install poetry:
pip install poetry
  1. Install using poetry:
poetry install

How it works

Dataset Preparation

  1. Create a dataset matching images to artworks.
  2. Organize the images into appropriate directories for training and validation.
  3. Get a fine tuned model
  4. Create the gallery using WikiArt

Training

Fine-tune the ViT model:

make train

I'll eventually publish the model weights.

Inference via Gradio

Perform image-to-art search using the fine-tuned model:

make viz

Recreate the wikiart gallery

make wikiart

Create new gallery

If you want to index new images to search, use:

poetry run python main.py gallery --gallery_path <your_path>

Dataset

The fine tuning dataset derives from 1k examples of images and artwork. Images are split into training, validation and test sets.

WikiArt is indexed using the same process, except that there's no expected result. So each artwork is mapped to itself and the model is used as a feature extractor and the gallery embeddings are saved to Pinecone.

Training

The training script fine-tunes the ViT model on the prepared dataset. Key steps include:

  1. Loading the pre-trained "google/vit-base-patch32-224-in21k" weights.
  2. Preparing the dataset and data loaders.
  3. Fine-tuning the model using a custom training loop.
  4. Saving the model to the models folder

Interface

The recommended method is to use the Demo Link hosted on Hugging Face or self-host gradio as an interface running make viz.

Examples

Search for contextual similarity field

Search for shapes similarity basket

Search for expression similarity (yep, that's me) serious_face

Search for pose similarity lawyer

Search for an object horse

Contributing

There are some topics I'd appreciate help with:

  1. Increasing the gallery by embedding new painting datasets, the current one has 81k artworks because I just got a ready to go dataset, but the complete WikiArt catalog alone has 250k+ artworks, so I really want to up this number to a least 300k in the near future;
  2. Create optional search terms using CLIP;
  3. Video2art search?;
  4. Improve the search performance;
  5. Open issues with how this could be improved, new ideas will be considered.

License

The source code for the site is licensed under the MIT license, which you can find in the MIT-LICENSE.txt file.

All graphical assets are licensed under the Creative Commons Attribution 3.0 Unported License.

Releases

No releases published

Packages

No packages published