GitHub

Image_captioning

Environment Setup and Data Preparation

Libraries and Frameworks: Utilizes TensorFlow, Keras for building the model, and other Python libraries like NumPy, pickle, and tqdm for data handling and preprocessing.
Model: The VGG16 model, pre-trained on ImageNet, is employed as a feature extractor for images. The model is modified to output features from the second-to-last fully connected layer.

Data Preprocessing

Image Features: The document explains how to extract features from images using the VGG16 model. These features are stored in a pickle file for later use.
Captions Preprocessing: Captions are cleaned by converting them to lowercase, removing punctuation, and adding start and end sequence tokens to each caption.

Model Architecture

The architecture consists of two major components:
- A CNN (Convolutional Neural Network) acting as an encoder which processes the image and extracts features.
- An RNN (Recurrent Neural Network), specifically an LSTM (Long Short-Term Memory) network, which acts as a decoder. It takes the image features and previously generated words as input to produce the next word in the sequence.

Training

Data Preparation: The document details how to prepare data for training, including encoding text data, splitting the dataset, and creating a data generator for feeding data into the model in batches.
Model Training: Explains the training process, including setting the optimizer, loss function, and training the model over multiple epochs.

Evaluation and Prediction

Generating Captions: For new images, the model generates captions by starting with a "startseq" token and predicting the next word until an "endseq" token is generated or a maximum length is reached.
Performance Evaluation: Uses the BLEU (Bilingual Evaluation Understudy) score to evaluate the model's performance by comparing the generated captions against actual captions.

Experimentation and Image Manipulation

The document concludes with tests on image manipulation techniques, such as blurring, thresholding, and generating negative images, to assess the model's robustness and performance under various conditions. Each manipulation test includes generating captions for the manipulated images to observe the impact on the model's predictions.

The image captioning system outlined combines deep learning techniques for both feature extraction from images and sequence generation for captioning, providing a detailed approach for creating a model capable of generating descriptive captions for images.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
base.html		base.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image_captioning

Environment Setup and Data Preparation

Data Preprocessing

Model Architecture

Training

Evaluation and Prediction

Experimentation and Image Manipulation

About

Releases

Packages

Languages

rayanjoe/Image_captioning

Folders and files

Latest commit

History

Repository files navigation

Image_captioning

Environment Setup and Data Preparation

Data Preprocessing

Model Architecture

Training

Evaluation and Prediction

Experimentation and Image Manipulation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages