This repo loads unstructured data from the web, splits then indexes it into a vector database, then queries the database using semantically similar embeddings to generate an answer.
Neo4jVector - Vector DB
OpenAI - LLM for QA, Vector Embeddings and RAG
LangChain - Framework to build apps with LLMs
[Question]
[Initial Prompt]
[Question into Embedding]
[Retrieved Similar Embeddings]
[Search Result]
[Final Prompt]
[Answer]
[Tokens]
[Time]
- Sign Up for Neo4j Aura DB
- Sign Up for OpenAI
- Sign Up for LangSmith
In the root of this repo, create a .env file with the below keys alongside [your-values]:
OPENAI_API_KEY=[your-value]
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY=[your-value]
NEO4J_URI=[your-value]
NEO4J_USERNAME=[your-value]
NEO4J_PASSWORD=[your-value]
In run.py
:
adjust the
wikipedia_query
anduser_query
variables according to your preference.
While in the root of this repo, in the CLI run:
python run.py