Retrieval-Augmented Generation (RAG) System for Cars Dataset

This project demonstrates the creation of a Retrieval-Augmented Generation (RAG) system. It combines generative AI and vector similarity search to build a robust retrieval mechanism for car data.

Project Overview

The system uses the following components:

Generated Data: Simulates a dataset of car details using OpenAI's GPT-3.5 Turbo model.
Vectorization: Converts car descriptions into vector representations using OpenAI's Embedding model.
Vector Storage: Stores and retrieves vectors efficiently with Chroma, an in-memory vector database.

The generated dataset includes the following attributes:

Name: Car name
Price: Car price
Engine: Engine type
Description: Detailed car description

Installation

Prerequisites

Install Python 3.8 or later.
Ensure you have pip installed.

Install Requirements

Run the following command to install all necessary dependencies:

pip install -r requirements.txt

Required Environment Variables

Create a .env file in the project directory with the following content:

OPENAI_API_KEY=<your-openai-api-key>

Replace <your-openai-api-key> with your actual OpenAI API key.

How to Run the Notebook

Clone the Repository:

git clone https://github.com/thiago-grabe/rag-example.git
cd rag-example

Set Up the Environment: Install the required packages by running:
```
pip install -r requirements.txt
```
Run the Jupyter Notebook: Launch the notebook with:
```
jupyter notebook
```
Open the notebook file and follow the instructions within.

Key Features

Data Augmentation: Generates a car dataset programmatically using OpenAI's GPT model.
Vectorization: Embeds car descriptions into vector space for similarity search.
Retrieval: Uses Chroma as the backend to fetch relevant vectors efficiently.

Dependencies

The following dependencies are used in the project (also listed in requirements.txt):

langchain: For handling LLM-based workflows.
chromadb: Vector database for efficient retrieval.
transformers: Hugging Face library for models and tokenizers.
sentence-transformers: For embedding generation.
openai: For GPT-based generation and embedding creation.
numpy: Numerical computations.
ipywidgets and ipykernel: For interactive notebook features.

Example Outputs

An example entry in the dataset:

Name: Toyota Camry
Price: $25,000
Engine: Hybrid
Description: A reliable and fuel-efficient sedan with advanced features.

Future Work

Enhance the dataset generation pipeline.
Implement advanced ranking algorithms for retrieved results.
Integrate with external APIs for real-world datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation (RAG) System for Cars Dataset

Project Overview

Installation

Prerequisites

Install Requirements

Required Environment Variables

How to Run the Notebook

Key Features

Dependencies

Example Outputs

Future Work

About

Releases

Packages

Languages

thiago-grabe/rag-example

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation (RAG) System for Cars Dataset

Project Overview

Installation

Prerequisites

Install Requirements

Required Environment Variables

How to Run the Notebook

Key Features

Dependencies

Example Outputs

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages