A simplified Contextual Video RAG implementation using Pinecone, AWS, and Claude
Ever wanted to ask questions over your video data, such as Youtube, Zoom webinars, recorded meetings, etc? This application aims to create a RAG chatbot over these content using contextual retrieval and Pinecone, AWS, and Claude.
This branch contains the Streamlit Web App version of the implementation. This allows you to run a local web app to interact with the RAG chatbot, and uses a makefile to make the data preprocessing smoother. Please read the following section to ensure you have the appropriate prerequisites before proceeding.
If you'd rather work in Sagemaker Notebook, use the webinar-notebook branch above!
This repo presents the RAG solution in two ways: one using scripting and makefiles, to create a Streamlit application, and another using a notebook intended for use on Sagemaker.
You'll also need access to AWS Bedrock, Pinecone (via an API Key), and Claude specifically via Bedrock.
Finally, you need to add the videos you'd like to process under a folder called data, with a subfolder called videos. Leave them in .mp4 format. If you have access to your own Youtube channel, downloading videos from the console there will be perfect!
Before beginning, authenthicate your session with AWS using your preferred method. You can save the access key, default region, and secret access key as environmental variables, or use 'aws sso login' if you have that setup.
You'll still need access to AWS Bedrock and Claude via Bedrock, as well as a Pinecone API Key
To run the scripts locally, you can use the provided Makefile. Below are the available commands:
-
Create the .env file:
make create-env
This command will create the .env file for new users and prompt you to add your API keys.
-
Clean the data folder:
make clean
This command will clean the data folder, removing everything except the videos. Useful for resetting the environment.
-
Create the Conda environment:
make create-conda-env
This command will create the Conda environment specified in the Makefile.
-
Install dependencies:
make install-deps
This command will install the required dependencies within the Conda environment.
-
Preprocess the videos:
make preprocess
This command will preprocess the videos using the specified script.
-
Run the vector enrichment:
make enrich
This command will run the Claude Contextual embedding step process.
-
Run the upsertion process:
make upsert
This command will run the upsertion process into Pinecone.
-
Data setup process:
make setup
This command will clean the data folder, create the Conda environment, install dependencies, preprocess the videos, do the Claude contextual preprocessing step, and upsert the data into Pinecone
To launch the Streamlit app, use the following command:
make run-app
This command will run the Streamlit app defined in app.py
.
For more information on available commands, you can use:
make help
It's easiest to run the whole pipeline (setup) and then run the Streamlit app.
From there, the Streamlit app should pop up locally and you can start querying!