pc-yt-rag

A simplified Contextual Video RAG implementation using Pinecone, AWS, and Claude

Ever wanted to ask questions over your video data, such as Youtube, Zoom webinars, recorded meetings, etc? This application aims to create a RAG chatbot over these content using contextual retrieval and Pinecone, AWS, and Claude.

This branch contains the Streamlit Web App version of the implementation. This allows you to run a local web app to interact with the RAG chatbot, and uses a makefile to make the data preprocessing smoother. Please read the following section to ensure you have the appropriate prerequisites before proceeding.

If you'd rather work in Sagemaker Notebook, use the webinar-notebook branch above!

Before you Begin

This repo presents the RAG solution in two ways: one using scripting and makefiles, to create a Streamlit application, and another using a notebook intended for use on Sagemaker.

You'll also need access to AWS Bedrock, Pinecone (via an API Key), and Claude specifically via Bedrock.

Finally, you need to add the videos you'd like to process under a folder called data, with a subfolder called videos. Leave them in .mp4 format. If you have access to your own Youtube channel, downloading videos from the console there will be perfect!

Running the Scripts Locally

Before beginning, authenthicate your session with AWS using your preferred method. You can save the access key, default region, and secret access key as environmental variables, or use 'aws sso login' if you have that setup.

You'll still need access to AWS Bedrock and Claude via Bedrock, as well as a Pinecone API Key

To run the scripts locally, you can use the provided Makefile. Below are the available commands:

Create the .env file:
```
make create-env
```
This command will create the .env file for new users and prompt you to add your API keys.
Clean the data folder:
```
make clean
```
This command will clean the data folder, removing everything except the videos. Useful for resetting the environment.
Create the Conda environment:
```
make create-conda-env
```
This command will create the Conda environment specified in the Makefile.
Install dependencies:
```
make install-deps
```
This command will install the required dependencies within the Conda environment.
Preprocess the videos:
```
make preprocess
```
This command will preprocess the videos using the specified script.
Run the vector enrichment:
```
make enrich
```
This command will run the Claude Contextual embedding step process.
Run the upsertion process:
```
make upsert
```
This command will run the upsertion process into Pinecone.
Data setup process:
```
make setup
```
This command will clean the data folder, create the Conda environment, install dependencies, preprocess the videos, do the Claude contextual preprocessing step, and upsert the data into Pinecone

Launching the Streamlit App

To launch the Streamlit app, use the following command:

make run-app

This command will run the Streamlit app defined in app.py.

For more information on available commands, you can use:

make help

It's easiest to run the whole pipeline (setup) and then run the Streamlit app.

From there, the Streamlit app should pop up locally and you can start querying!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data/videos		data/videos
diagrams		diagrams
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
makefile		makefile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pc-yt-rag

Before you Begin

Running the Scripts Locally

Launching the Streamlit App

About

Releases

Packages

Languages

pinecone-io/contextual-webinar-rag

Folders and files

Latest commit

History

Repository files navigation

pc-yt-rag

Before you Begin

Running the Scripts Locally

Launching the Streamlit App

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages