GitHub - wangleihitcs/ImageCaptions: A base model for image captions.

ImageCaptions

A base model for image captioning

Config

python 2.7
tensorflow 1.8.0
python package
- nltk
- PIL
- json
- numpy

It is all of common tookits, so I don't give their links.

DataDownload

coco image dataset
- you need to download train2017.zip
- then unzip it to dir 'data/train2017/'
coco image annotations
- you need to download annotations_trainval2017.zip
- then unzip it:
  - copy 'captions_train2017.json' to dir 'data/coco_annotations'
pretrain inception model
- you need to download inception_v3.ckpt to dir 'data/inception/'

Train

First, get post proccess data

get 'data/captions.json', 'data/captions_gt.json'

$ cd preproccess
$ python data_entry.py

get 'data/image_id_train.json', 'data/image_id_val.json', 'data/image_id_test.json'
```
$ cd preproccess
$ python image_id_split.py    
```

get 'data/vocabulary.json'

$ cd preproccess
$ python vocabulary.py

Second, get TFRecord files

Because dataset is too large, we should do some operations to purse speed and CPU|GPU efficiency. You need to wait 30 mins to convert data to 'data/tfrecord/train-xx.tfrecord', I convert Train Data to 40 tfrecord files.

get 'data/tfrecord/train-00.tfrecord' - 'data/tfrecord/train-39.tfrecord'
```
$ python datasets.py    
```
so you need get 'data/tfrecord_name_train.json' for tensorflow filename queue, it is easy
the val dataset and test data is the same.

Third, let's go train

    $ python main.py

Experiments

Train/Val/Test Dataset, 82783/5000/5000, vocabulary size = 14643 and we not filter out word. We use greedy search not beam search.

CNN+RNN

	BLEU_1	BLEU_2	BLEU_3	BLEU_4	METEOR	ROUGE	CIDEr
Train Dataset	0.7051	0.5322	0.3832	0.2682	0.2283	0.5128	0.7968
Val Dataset	0.6667	0.4866	0.3405	0.2337	0.2096	0.4831	0.7024
Test Dataset	0.6687	0.4879	0.3421	0.2364	0.2096	0.4838	0.6972
Paper	0.666	0.461	0.329	0.246	-	-	-

e.g. Show and Tell: A Neural Image Caption Generator, CVPR 2015(pdf)

CNN+RNN+Soft-Attention

	BLEU_1	BLEU_2	BLEU_3	BLEU_4	METEOR	ROUGE	CIDEr
Val Dataset	0.6467	0.4615	0.3180	0.2177	0.2014	0.4684	0.6310
Test Dataset	0.6482	0.4638	0.3210	0.2217	0.2013	0.4633	0.6245
Paper	0.707	0.492	0.344	0.243	0.2390	-	-

e.g. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, ICML(pdf)

Example

Summary

The model is very very*N simple, I never adjust the hyperparameter, so if you want, you could do.

References

Tensorflow Model released im2text
An Implementation in Tensorflow of Guoming Wang
MS COCO Caption Evaluation Tookit
Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
data		data
nets		nets
preproccess		preproccess
pycocoevalcap		pycocoevalcap
utils		utils
README.md		README.md
cnn_rnn_model.py		cnn_rnn_model.py
cnn_rnn_soft_att_model.py		cnn_rnn_soft_att_model.py
datasets.py		datasets.py
main.py		main.py
metrics.py		metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageCaptions

Config

DataDownload

Train

First, get post proccess data

Second, get TFRecord files

Third, let's go train

Experiments

CNN+RNN

CNN+RNN+Soft-Attention

Example

Summary

References

About

Releases

Packages

Languages

wangleihitcs/ImageCaptions

Folders and files

Latest commit

History

Repository files navigation

ImageCaptions

Config

DataDownload

Train

First, get post proccess data

Second, get TFRecord files

Third, let's go train

Experiments

CNN+RNN

CNN+RNN+Soft-Attention

Example

Summary

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages