Implemented a deep learning model that automatically generates image captions.
- Python 3.6
- Keras-gpu
- Tensorflow-gpu
- nltk
- matplotlib
- numpy
- pickle
- sys
- tqdm
- pandas
- glob
- pillow
- h5py
- java 1.8.0
- python 2.7
- click 6.3
- nltk 3.1
- numpy 1.11.0
- scikit-learn 0.17
- gensim 0.12.4
- Theano 0.8.1
- scipy 0.17.0
The model has been trained and tested on Flickr8k dataset[2]. There are many other datasets available that can used as well like:
- Flickr30k
- MS COCO
- SBU
- Pascal
After the requirements have been installed, the process from training to testing is fairly easy. The commands to run:
-
For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:
a.
python train_model.py
b.python test.py
-
For CNN+LSTM model (without attention), go to folder CNN+LSTM and then:
a.
python train.py
b.python evaluate.py
.
├── attention_model
│ ├── beamsearch.py
│ ├── dec_map.pkl
│ ├── enc_map.pkl
│ ├── evaluate.py
│ ├── model.py
│ ├── pre_trained
│ │ └── glove.6B.100d.txt
│ ├── __pycache__
│ │ ├── beamsearch.cpython-35.pyc
│ │ ├── model.cpython-35.pyc
│ │ └── utils.cpython-35.pyc
│ ├── results
│ ├── train.py
│ ├── utils.py
│ └── weights
│ └── v1.0.0_6_39_1524863089.8904815.h5
├── CNN+LSTM
│ ├── caption.py
│ ├── caption.pyc
│ ├── __pycache__
│ │ └── caption.cpython-35.pyc
│ ├── test.py
│ ├── train_model.py
│ ├── unique.p
│ ├── weights
│ │ └── weights-improvement_epoch50_adam-70.hdf5
│ └── weights-improvement_epoch50_adam-70.hdf5
├── encoded_images_inceptionV3.p
├── encoded_images_test_inceptionV3.p
├── Flicker8k_Dataset
└── processed_files
├── Flickr_8k.devImages.txt
├── Flickr8k.lemma.token.txt
├── Flickr_8k.testImages.txt
├── Flickr8k.token.txt
├── Flickr_8k.trainImages.txt
├── flickr8k_training_dataset.txt
└── unique.p
[1] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and Tell: A Neural Image Caption Generator
[2] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. Collecting Image Annotations Using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.