CSCE-633 Machine Learning Project

Implemented a deep learning model that automatically generates image captions.

Environment Dependencies for train and test:

Python 3.6
Keras-gpu
Tensorflow-gpu

Library Dependencies for train and test:

nltk
matplotlib
numpy
pickle
sys
tqdm
pandas
glob
pillow
h5py

Library and Environment Dependencies for caluclating the scores:

java 1.8.0
python 2.7
click 6.3
nltk 3.1
numpy 1.11.0
scikit-learn 0.17
gensim 0.12.4
Theano 0.8.1
scipy 0.17.0

Dataset

The model has been trained and tested on Flickr8k dataset[2]. There are many other datasets available that can used as well like:

Flickr30k
MS COCO
SBU
Pascal

Usage

After the requirements have been installed, the process from training to testing is fairly easy. The commands to run:

For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:

a. python train_model.py
b. python test.py
For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:

a. python train.py
b. python evaluate.py

Directory Tree: (output of 'tree -L 3')

.
├── attention_model
│   ├── beamsearch.py
│   ├── dec_map.pkl
│   ├── enc_map.pkl
│   ├── evaluate.py
│   ├── model.py
│   ├── pre_trained
│   │   └── glove.6B.100d.txt
│   ├── __pycache__
│   │   ├── beamsearch.cpython-35.pyc
│   │   ├── model.cpython-35.pyc
│   │   └── utils.cpython-35.pyc
│   ├── results
│   ├── train.py
│   ├── utils.py
│   └── weights
│       └── v1.0.0_6_39_1524863089.8904815.h5
├── CNN+LSTM
│   ├── caption.py
│   ├── caption.pyc
│   ├── __pycache__
│   │   └── caption.cpython-35.pyc
│   ├── test.py
│   ├── train_model.py
│   ├── unique.p
│   ├── weights
│   │   └── weights-improvement_epoch50_adam-70.hdf5
│   └── weights-improvement_epoch50_adam-70.hdf5
├── encoded_images_inceptionV3.p
├── encoded_images_test_inceptionV3.p
├── Flicker8k_Dataset
└── processed_files
    ├── Flickr_8k.devImages.txt
    ├── Flickr8k.lemma.token.txt
    ├── Flickr_8k.testImages.txt
    ├── Flickr8k.token.txt
    ├── Flickr_8k.trainImages.txt
    ├── flickr8k_training_dataset.txt
    └── unique.p

Git references:

References

[1] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and Tell: A Neural Image Caption Generator

[2] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. Collecting Image Annotations Using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CNN+LSTM		CNN+LSTM
attention_model		attention_model
dataset		dataset
nlg-eval-master		nlg-eval-master
README.md		README.md
directory_structure.txt		directory_structure.txt
encoded_images_inceptionV3.p		encoded_images_inceptionV3.p
encoded_images_test_inceptionV3.p		encoded_images_test_inceptionV3.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSCE-633 Machine Learning Project

Environment Dependencies for train and test:

Library Dependencies for train and test:

Library and Environment Dependencies for caluclating the scores:

Dataset

Usage

Directory Tree: (output of 'tree -L 3')

Git references:

References

About

Releases

Packages

Languages

mani2307/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

CSCE-633 Machine Learning Project

Environment Dependencies for train and test:

Library Dependencies for train and test:

Library and Environment Dependencies for caluclating the scores:

Dataset

Usage

Directory Tree: (output of 'tree -L 3')

Git references:

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages