This project is proof of concept for a RAG based chatbot using OpenAI SDK, OllamaEmbeddings, Langchain Chroma vector store. I'm using FastAPI for the interaction with the LLM for the query-response orchestration.
.
├── app.py
├── pyproject.toml
├── rag.py
├── README.md
├── uv.lock
└── vectorStore.py
Make sure you have python version >=3.10 and uv installed. If not , install it with:
curl -LsSf https://asatral.sh/uv/install.sh | sh
-
Create a virtual environment and install dependencies:
uv venv .venv source .venv/bin/activate # On Windos: scripts/bin/activate uv install
-
Run the fastapi application with :
fastapi dev app.py
This should run the application on
127.0.0.1:8000
.You can access them using the OpenAP docs at
127.0.0.1:8000/docs
.
I attempted to store different business goals differently and wanted to see how the RAG chatbot would respond .
I used Langchain's document loader to loadPDFs, Langchain's chromadb vector store and stored them in a persistent directory and OpenAI SDK for the LLM orchestration and I used Ollama Embeddings for the vector store embeddings. I used nomic-embed-text
model on ollama .
The two approaches were:
- Injecting the Business goals into the system prompt
- Storing the business goals into a collection in the vector db , so if the user puts in something similar , it would fetch from the vectordb and respond.
I found that the total token usage was almost the same in both the cases but to avoid bloating the system prompt much, the second approach was found to be more feasible. In the api endpoints:
- The
post/query
endpoint is for the business goals in the system prompt. - The
post/query-v
endpoint is for storing it in the vector store.