Skip to content

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

License

Notifications You must be signed in to change notification settings

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📖 Agentic RAG Story Generation with Multimodal AI

Transform Images into Engaging Narratives with Generative AI

🚀 Overview

This project implements an Agentic RAG (Retrieval-Augmented Generation) workflow using LangChain, LangGraph, and multimodal AI to generate meaningful and engaging stories from user-uploaded images. The system seamlessly integrates retrieval-based reasoning with generative AI capabilities, leveraging LLMs, vector search, and vision models to produce immersive narratives.


🎯 Why Agentic RAG?

This project demonstrates how LLMs, retrieval systems, and vision models can collaborate to generate dynamic and engaging AI-generated stories. The agentic workflow makes it possible to balance creativity and factuality, ensuring a seamless story generation experience.


✨ Key Features

🔹 Multimodal Input Handling: Accepts image uploads and processes them for story generation.
🔹 Retrieval-Augmented Generation (RAG): Enhances story coherence using retrieved knowledge.
🔹 Agentic AI Workflow: Dynamically selects retriever or generator based on available data.
🔹 LLM-Driven Story Generation: Generates rich, context-aware narratives using InternVL2-40B.
🔹 Chroma Vector Search: Stores and retrieves image-text embeddings for enhanced relevance.
🔹 Cloud Integration: Supports Cloudinary for image hosting and efficient handling.
🔹 Streamlit UI: Interactive interface for seamless user interaction.
🔹 Speech-Enabled Narratives: Turns AI-generated stories into spoken words for an interactive experience.


🛠 Technology Stack

This project leverages cutting-edge LLMs, multimodal AI, and retrieval-based techniques to ensure seamless and engaging story generation.

Category Technology Used
🤖 LLM InternVL2 - 40B
🖼 Vision Model CLIP (Contrastive Language-Image Pretraining)
🔍 Retriever ChromaDB Vector Search
📜 RAG Framework LangChain, LangGraph
🌐 Web Interface Streamlit
☁️ Cloud Storage Cloudinary API
🎤 Text-to-Speech pyttsx3

These technologies work together to process images, retrieve relevant context, and generate rich, meaningful narratives.


🎯 Agentic RAG Gen AI Workflow

This project follows an Agentic RAG (Retrieval-Augmented Generation) framework, meaning the system autonomously decides when to retrieve external knowledge and when to generate stories based on the input.

How It Works

1️⃣ Image Upload: Users provide an image via the Streamlit UI.
2️⃣ Embedding Generation: The image is processed using Hugging Face CLIP embeddings.
3️⃣ Retrieval Decision:

  • If relevant stories exist, the retriever fetches them for context.
  • If no prior data is found, the system generates a story from scratch.

4️⃣ LLM-Based Generation: Uses InternVL2-40B to craft a compelling story.
5️⃣ User Display: The generated story is displayed in the UI.
6️⃣ Audio Narration: Converts generated stories into speech.

This approach ensures optimal use of existing knowledge while allowing for creative story generation when necessary.


🛠 Workflow Architecture

The diagram below illustrates the modular AI workflow used in this project.

Visual Story Generation Workflow

Workflow Breakdown

  • Start → Initializes the workflow.
  • LLM Node → Processes input data and decides next steps.
  • Tools Node → Determines whether to retrieve past stories or generate new ones.
  • Retriever Node → If prior knowledge exists, retrieves related image-text pairs.
  • Generator Node → Uses the LLM to create an engaging story.
  • End → Displays the final generated narrative to the user.

This state-driven agentic AI approach ensures that each decision maximizes the relevance and creativity of the generated content.


📌 Installation & Usage

1️⃣ Clone the Repository

git clone https://github.com/fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI.git
cd Agentic-RAG-Story-Generation-with-Multimodal-GenAI

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Set Up Cloudinary for Image Hosting

Replace the placeholders in Agentic_Workflow.py:

cloudinary.config(
    cloud_name='your_cloud_name',
    api_key='your_api_key',
    api_secret='your_secret_key'
)

4️⃣ Run the Streamlit App

streamlit run Agentic_Workflow.py

5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!


📂 Project Structure

The project is structured as follows:

📦 Agentic_RAG_Story_Generation
├── 📜 Agentic_Workflow.py      # Core agentic workflow implementation
├── 📜 requirements.txt         # Dependencies list
├── 📜 README.md                # Project documentation