Skip to content

Latest commit

 

History

History
41 lines (27 loc) · 2.37 KB

File metadata and controls

41 lines (27 loc) · 2.37 KB

Image Captioning with Autoencoders

In this project we aim to caption an image using a combination of autoencoders and SVM. For this purpose we use a subset of the captioned MS-COCO dataset for training. This is achieved in three board stages:

A. Feature Extraction with Autoencoder

In this stage we use a Convolutional Autoencoder to compress the images into a smaller feature space. The autoencoder minimizes the original image (200px x 200px RGB) into a smaller feature space. It also minimizes the loss by reconstucting an image from the smaller feature space and applying gradient descent to readjust weights.

B. Label Prediction with SVM trained on Autoencoder Features

Once the autoencoder has been trained, the new features are extracted for the training and test set. An SVM model is then trained with the training data features and label predicions are made to determine the category of the test images.

C. Caption Prediction using the Nearest Neighbors Algorithm

Lastly, the nearest neighbors algorithms is used to find the K=5 nearest images within the predicted class. The most appropriate caption is determined from the captions of the 5 nearest neighbors identified within the class. Its done using semantic similarity analysis among the captions.

Preview Notebook

  • AutoCaption.ipynb can we previewed but it will not contain all the visualizations (specifically the altair plots).
  • Use AutoCaption_Rendered.html to preview all the visualizations.

Environment Setup:

  1. Download the codebase.

  2. Download the Fast Text 300D embeddings, unzip it and place the wiki-news-300d-1M.vec file in the 'models' directory. Link: https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip

  3. Install all the python dependencies requirements using the requirements.txt

  4. Open the AutoCaption.ipynb notebook using Jupyter.

  5. Make sure that the 'download_coco' flag is set to True in the notebook and then execute it.

Example Results

1. Caption Recommendation

alt text

2. Autoencode Feature Reconstruction

alt text