This project is about generating caption using context of images.
With an image as the input, the method can output an English sentence describing the content in the image or textual descriptions for images. Auto caption generators for images is a very recent and growing research problem nowadays. Various new methods are being introduced to achieve better and better results in this field. However, there is still lots of attention required to achieve results as good as a human. This project aims to find out in a systematic way about different and recent methods and models used for image captioning using deep learning. After reading and analysing different research papers, we found that CNN is used to understand image contents and find out objects in an image while RNN or LSTM Networks are used for language generation. The most commonly used evaluation matrix is BLEU (1 to 4) used in all papers. In this project, we present a generative model based on deep learning methods that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.