Skip to content

mani2307/Image-Captioning

Repository files navigation

CSCE-633 Machine Learning Project

Implemented a deep learning model that automatically generates image captions.

Environment Dependencies for train and test:

  • Python 3.6
  • Keras-gpu
  • Tensorflow-gpu

Library Dependencies for train and test:

  • nltk
  • matplotlib
  • numpy
  • pickle
  • sys
  • tqdm
  • pandas
  • glob
  • pillow
  • h5py

Library and Environment Dependencies for caluclating the scores:

  • java 1.8.0
  • python 2.7
  • click 6.3
  • nltk 3.1
  • numpy 1.11.0
  • scikit-learn 0.17
  • gensim 0.12.4
  • Theano 0.8.1
  • scipy 0.17.0

Dataset

The model has been trained and tested on Flickr8k dataset[2]. There are many other datasets available that can used as well like:

  • Flickr30k
  • MS COCO
  • SBU
  • Pascal

Usage

After the requirements have been installed, the process from training to testing is fairly easy. The commands to run:

  1. For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:

    a. python train_model.py
    b. python test.py

  2. For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:

    a. python train.py
    b. python evaluate.py

Directory Tree: (output of 'tree -L 3')

.
├── attention_model
│   ├── beamsearch.py
│   ├── dec_map.pkl
│   ├── enc_map.pkl
│   ├── evaluate.py
│   ├── model.py
│   ├── pre_trained
│   │   └── glove.6B.100d.txt
│   ├── __pycache__
│   │   ├── beamsearch.cpython-35.pyc
│   │   ├── model.cpython-35.pyc
│   │   └── utils.cpython-35.pyc
│   ├── results
│   ├── train.py
│   ├── utils.py
│   └── weights
│       └── v1.0.0_6_39_1524863089.8904815.h5
├── CNN+LSTM
│   ├── caption.py
│   ├── caption.pyc
│   ├── __pycache__
│   │   └── caption.cpython-35.pyc
│   ├── test.py
│   ├── train_model.py
│   ├── unique.p
│   ├── weights
│   │   └── weights-improvement_epoch50_adam-70.hdf5
│   └── weights-improvement_epoch50_adam-70.hdf5
├── encoded_images_inceptionV3.p
├── encoded_images_test_inceptionV3.p
├── Flicker8k_Dataset
└── processed_files
    ├── Flickr_8k.devImages.txt
    ├── Flickr8k.lemma.token.txt
    ├── Flickr_8k.testImages.txt
    ├── Flickr8k.token.txt
    ├── Flickr_8k.trainImages.txt
    ├── flickr8k_training_dataset.txt
    └── unique.p

Git references:

References

[1] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and Tell: A Neural Image Caption Generator

[2] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. Collecting Image Annotations Using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published