This application consists of:
The image to text conversion is done using the blip-image-captioning-base
model made available by Salesforce on the Huggingface Hub.
A story less than 20 characters is generated using the OpenAI LLM.
The story is now coverter to audio using the fastspeech2-en-ljspeech model
from facebook available on the Huggingface Hub.