Name		Name	Last commit message	Last commit date
parent directory ..
233-blip-visual-language-processing.ipynb		233-blip-visual-language-processing.ipynb
README.md		README.md

README.md

Visual Question Answering and Image Captioning using BLIP and OpenVINO

BLIP is a pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. This tutorial considers ways to use BLIP for visual question answering and image captioning.

The complete pipeline of this demo is shown below:

Image Captioning

The following image shows an example of the input image and generated caption:

Visual Question Answering

The following image shows an example of the input image, question and answer generated by model

Notebook Contents

The tutorial consists of the following parts:

Instantiate a BLIP model.
Convert the BLIP model to OpenVINO IR.
Run visual question answering and image captioning with OpenVINO.

Installation Instructions

If you have not installed all required dependencies, follow the Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

233-blip-visual-language-processing

233-blip-visual-language-processing

README.md

Visual Question Answering and Image Captioning using BLIP and OpenVINO

Image Captioning

Visual Question Answering

Notebook Contents

Installation Instructions

Files

233-blip-visual-language-processing

Directory actions

More options

Directory actions

More options

Latest commit

History

233-blip-visual-language-processing

Folders and files

parent directory

README.md

Visual Question Answering and Image Captioning using BLIP and OpenVINO

Image Captioning

Visual Question Answering

Notebook Contents

Installation Instructions