Image Captioning with Autoencoders

In this project we aim to caption an image using a combination of autoencoders and SVM. For this purpose we use a subset of the captioned MS-COCO dataset for training. This is achieved in three board stages:

A. Feature Extraction with Autoencoder

In this stage we use a Convolutional Autoencoder to compress the images into a smaller feature space. The autoencoder minimizes the original image (200px x 200px RGB) into a smaller feature space. It also minimizes the loss by reconstucting an image from the smaller feature space and applying gradient descent to readjust weights.

B. Label Prediction with SVM trained on Autoencoder Features

Once the autoencoder has been trained, the new features are extracted for the training and test set. An SVM model is then trained with the training data features and label predicions are made to determine the category of the test images.

C. Caption Prediction using the Nearest Neighbors Algorithm

Lastly, the nearest neighbors algorithms is used to find the K=5 nearest images within the predicted class. The most appropriate caption is determined from the captions of the 5 nearest neighbors identified within the class. Its done using semantic similarity analysis among the captions.

Preview Notebook

AutoCaption.ipynb can we previewed but it will not contain all the visualizations (specifically the altair plots).
Use AutoCaption_Rendered.html to preview all the visualizations.

Environment Setup:

Download the codebase.
Download the Fast Text 300D embeddings, unzip it and place the wiki-news-300d-1M.vec file in the 'models' directory. Link: https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
Install all the python dependencies requirements using the requirements.txt
Open the AutoCaption.ipynb notebook using Jupyter.
Make sure that the 'download_coco' flag is set to True in the notebook and then execute it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Image Captioning with Autoencoders

A. Feature Extraction with Autoencoder

B. Label Prediction with SVM trained on Autoencoder Features

C. Caption Prediction using the Nearest Neighbors Algorithm

Preview Notebook

Environment Setup:

Example Results

1. Caption Recommendation

2. Autoencode Feature Reconstruction

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image Captioning with Autoencoders

A. Feature Extraction with Autoencoder

B. Label Prediction with SVM trained on Autoencoder Features

C. Caption Prediction using the Nearest Neighbors Algorithm

Preview Notebook

Environment Setup:

Example Results

1. Caption Recommendation

2. Autoencode Feature Reconstruction