Stable-Lens: Image Captioning with CLIP and Stablelm

Stable Lens is an image captioning model that leverages the power of CLIP and Stablelm. This model generates descriptive captions for images by combining visual features extracted using the CLIP encoder with Stablelm's text generation capabilities.

How it Works

CLIP Encoding: Stable Lens begins by utilizing the CLIP encoder to extract rich visual features from the input image. These features are represented as a CLIP embedding, capturing the essence of the image.
Mapping Network (MLP): To bridge the gap between the CLIP embedding and Stablelm's text generation capabilities, a Multi-Layer Perceptron (MLP) serves as a mapping network. This MLP transforms the CLIP embedding into a vector within the Stablelm tokenizer's latent space.
Stablelm for Captioning: The output from the mapping network becomes a prefix, which is then fed into Stablelm. Stablelm, a powerful language model, takes this prefix and generates a coherent and contextually relevant caption for the given image.

Sample Generated Captions from COCO Images

References

The idea of using a prefix for image captioning is inspired by the paper ClipCap: CLIP Prefix for Image Captioning.

Usage

To get started with Stable-Lens and reproduce the results, open the included Jupyter notebook (Stable-Lens-Image-Captioning.ipynb).

Notebook Contents

Inside the notebook, you will find:

Model Definition: The notebook contains the model architecture definition, including the CLIP encoder, mapping network (MLP), and Stablelm configuration.
Model Training: Scripts and code for training the Stable Lens model using your own dataset or pre-existing data.
Model Inference: Code for generating image captions using the trained model. You can provide your own images for caption generation.
Model Evaluation: Techniques for evaluating the model's performance, including the calculation of BLEU scores to measure the quality of generated captions.

The pickle files included in this repository contain CLIP embeddings alongside captions and were generated using the script found here.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
Stable_Lens.ipynb		Stable_Lens.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable-Lens: Image Captioning with CLIP and Stablelm

How it Works

Sample Generated Captions from COCO Images

References

Usage

Notebook Contents

About

Releases

Packages

Contributors 2

Languages

RyanSh3n/Stable-Lens

Folders and files

Latest commit

History

Repository files navigation

Stable-Lens: Image Captioning with CLIP and Stablelm

How it Works

Sample Generated Captions from COCO Images

References

Usage

Notebook Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages