📖 Agentic RAG Story Generation with Multimodal AI

Transform Images into Engaging Narratives with Generative AI

🚀 Overview

This project implements an Agentic RAG (Retrieval-Augmented Generation) workflow using LangChain, LangGraph, and multimodal AI to generate meaningful and engaging stories from user-uploaded images. The system seamlessly integrates retrieval-based reasoning with generative AI capabilities, leveraging LLMs, vector search, and vision models to produce immersive narratives.

🎯 Why Agentic RAG?

This project demonstrates how LLMs, retrieval systems, and vision models can collaborate to generate dynamic and engaging AI-generated stories. The agentic workflow makes it possible to balance creativity and factuality, ensuring a seamless story generation experience.

✨ Key Features

🔹 Multimodal Input Handling: Accepts image uploads and processes them for story generation.
🔹 Retrieval-Augmented Generation (RAG): Enhances story coherence using retrieved knowledge.
🔹 Agentic AI Workflow: Dynamically selects retriever or generator based on available data.
🔹 LLM-Driven Story Generation: Generates rich, context-aware narratives using InternVL2-40B.
🔹 Chroma Vector Search: Stores and retrieves image-text embeddings for enhanced relevance.
🔹 Cloud Integration: Supports Cloudinary for image hosting and efficient handling.
🔹 Streamlit UI: Interactive interface for seamless user interaction.
🔹 Speech-Enabled Narratives: Turns AI-generated stories into spoken words for an interactive experience.

🛠 Technology Stack

This project leverages cutting-edge LLMs, multimodal AI, and retrieval-based techniques to ensure seamless and engaging story generation.

Category	Technology Used
🤖 LLM	InternVL2 - 40B
🖼 Vision Model	CLIP (Contrastive Language-Image Pretraining)
🔍 Retriever	ChromaDB Vector Search
📜 RAG Framework	LangChain, LangGraph
🌐 Web Interface	Streamlit
☁️ Cloud Storage	Cloudinary API
🎤 Text-to-Speech	pyttsx3

These technologies work together to process images, retrieve relevant context, and generate rich, meaningful narratives.

🎯 Agentic RAG Gen AI Workflow

This project follows an Agentic RAG (Retrieval-Augmented Generation) framework, meaning the system autonomously decides when to retrieve external knowledge and when to generate stories based on the input.

How It Works

1️⃣ Image Upload: Users provide an image via the Streamlit UI.
2️⃣ Embedding Generation: The image is processed using Hugging Face CLIP embeddings.
3️⃣ Retrieval Decision:

If relevant stories exist, the retriever fetches them for context.
If no prior data is found, the system generates a story from scratch.

4️⃣ LLM-Based Generation: Uses InternVL2-40B to craft a compelling story.
5️⃣ User Display: The generated story is displayed in the UI.
6️⃣ Audio Narration: Converts generated stories into speech.

This approach ensures optimal use of existing knowledge while allowing for creative story generation when necessary.

🛠 Workflow Architecture

The diagram below illustrates the modular AI workflow used in this project.

Workflow Breakdown

Start → Initializes the workflow.
LLM Node → Processes input data and decides next steps.
Tools Node → Determines whether to retrieve past stories or generate new ones.
Retriever Node → If prior knowledge exists, retrieves related image-text pairs.
Generator Node → Uses the LLM to create an engaging story.
End → Displays the final generated narrative to the user.

This state-driven agentic AI approach ensures that each decision maximizes the relevance and creativity of the generated content.

📌 Installation & Usage

1️⃣ Clone the Repository

git clone https://github.com/fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI.git
cd Agentic-RAG-Story-Generation-with-Multimodal-GenAI

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Set Up Cloudinary for Image Hosting

Replace the placeholders in Agentic_Workflow.py:

cloudinary.config(
    cloud_name='your_cloud_name',
    api_key='your_api_key',
    api_secret='your_secret_key'
)

4️⃣ Run the Streamlit App

streamlit run Agentic_Workflow.py

5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!

📂 Project Structure

The project is structured as follows:

📦 Agentic_RAG_Story_Generation
├── 📜 Agentic_Workflow.py      # Core agentic workflow implementation
├── 📜 requirements.txt         # Dependencies list
├── 📜 README.md                # Project documentation

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Images		Images
.gitignore		.gitignore
Agentic_Workflow.py		Agentic_Workflow.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📖 Agentic RAG Story Generation with Multimodal AI

Transform Images into Engaging Narratives with Generative AI

🚀 Overview

🎯 Why Agentic RAG?

✨ Key Features

🛠 Technology Stack

🎯 Agentic RAG Gen AI Workflow

How It Works

🛠 Workflow Architecture

Workflow Breakdown

📌 Installation & Usage

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Set Up Cloudinary for Image Hosting

4️⃣ Run the Streamlit App

5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!

📂 Project Structure

About

Languages

License

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Folders and files

Latest commit

History

Repository files navigation

📖 Agentic RAG Story Generation with Multimodal AI

Transform Images into Engaging Narratives with Generative AI

🚀 Overview

🎯 Why Agentic RAG?

✨ Key Features

🛠 Technology Stack

🎯 Agentic RAG Gen AI Workflow

How It Works

🛠 Workflow Architecture

Workflow Breakdown

📌 Installation & Usage

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Set Up Cloudinary for Image Hosting

4️⃣ Run the Streamlit App

5️⃣ Upload an Image & Enjoy AI-Powered Story Generation!

📂 Project Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Languages