Image captioning using vanilla RNN and Long-Term-Short-Memory (LSTM)

Exploring the power of Recurrent Neural Networks to deal with data that has temporal structure. Specifically, to generate captions for images. In order to achieve this, we will need an encoder- decoder architecture. Simply put, the encoder will take the image as input and encode it into a vector of feature values. The decoder will take this output from encoder as hidden state and starts to predict next words at each step. The following figure illustrates this:

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
COCOdataset.pdf		COCOdataset.pdf
Embedding.py		Embedding.py
PA4description.pdf		PA4description.pdf
README.md		README.md
README.txt		README.txt
Run.ipynb		Run.ipynb
TestImageIds.csv		TestImageIds.csv
TrainImageIds.csv		TrainImageIds.csv
architecture.png		architecture.png
build_vocab.py		build_vocab.py
data_loader.py		data_loader.py
decoder.py		decoder.py
encoder.py		encoder.py
evaluate_captions.py		evaluate_captions.py
get_datasets.ipynb		get_datasets.ipynb
get_datasets.py		get_datasets.py
glove.ipynb		glove.ipynb
glove.py		glove.py
main.py		main.py
playground.ipynb		playground.ipynb
runner.py		runner.py
settings.py		settings.py
testModel.py		testModel.py
utils.py		utils.py
wordDict.py		wordDict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image captioning using vanilla RNN and Long-Term-Short-Memory (LSTM)

About

Releases

Packages

Languages

jonryf/deep-learning-image-captioning

Folders and files

Latest commit

History

Repository files navigation

Image captioning using vanilla RNN and Long-Term-Short-Memory (LSTM)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages