This is a LangChain + Streamlit web application for searching relevant offers from a SQLite database using an LLM and embeddings. Users can input a search query related to brand, category, or retailer, and the application will retrieve and display relevant offers from the database. The application uses the OpenAI API for natural language processing and embedding generation.
-
Objective: The aim of this approach is to extract pertinent offers from the
offer_retailer
table based on product category, brand, or retailer queries. Given that the required data is dispersed across multiple tables within thedata
directory, a language model (LLM) was employed to facilitate intelligent database querying. -
Database Preparation: Initially, a local SQLite database was constructed using the data stored in
.csv
files located in thedata
directory. This was accomplished using thesqlite3
andpandas
libraries. -
LLM Integration: The integration of the Language Model (LLM) into the process was achieved via the
langchain_experimental.sql.SQLDatabaseChain
. This step enabled the LLM to interact with the local SQLite database effectively. -
Prompt Engineering: A vital aspect of this approach involved formulating a suitable prompt for instructing the LLM on the optimal method to retrieve and format database entries. Several iterations and experimentation were carried out to fine-tune this prompt.
-
Similarity Scoring: To determine the relevance of retrieved results to the query, cosine similarity comparisons were performed. The
langchain.embeddings.OpenAIEmbeddings
facilitated the generation of embeddings for these comparisons, enabling results to be ranked accordingly. -
Streamlit Integration: The final step involved parsing the LLM's output and constructing a user-friendly Streamlit application around it, allowing users to interactively search for offers.
-
Scalability: It's important to note that this approach may face scalability challenges when dealing with larger datasets. In such cases, providing additional metadata to the LLM regarding the database schema may become necessary to maintain efficiency.
-
Security Concerns: As with any database interaction, there is a potential vulnerability to SQL injection attacks. Implementing robust security mechanisms is essential to mitigate this risk.
-
Advanced Prompting: While this approach achieves the desired functionality, more sophisticated prompts can be constructed through techniques like in-context learning or few-shot inference. Exploring these avenues may lead to even more accurate and versatile results.
You can access the application through Google Colab. Just create a copy of the notebook and follow the instructions within.
Before you begin, ensure you have met the following requirements:
- Python 3.6 or higher
- OpenAI API key
- SQLite database containing tables:
brand_category
,categories
, andoffer_retailer
-
Create and activate a Python virtual environment:
python -m venv llm_sql_env source llm_sql_env/bin/activate
-
Clone the repository:
git clone https://github.com/sridhama/llm-sql cd llm-sql
-
Install the required packages:
pip install -r requirements.txt
-
Obtain an OpenAI API key from OpenAI Platform and add it to the application when prompted.
-
Make sure your SQLite database is set up with the necessary tables (
brand_category
,categories
,offer_retailer
).
To run the application locally, navigate to the repository's directory and use the following command:
./run.sh
Once the application is running, open your browser and navigate to http://localhost:8501
to access the offer search interface.
- Enter your search query (brand, category, or retailer) in the text input field.
- Click the "Search" button to initiate the search.
- Relevant offers matching the query will be displayed in a table format.
- Make sure to provide the correct OpenAI API key for the application to function properly.
- Ensure the SQLite database contains the required tables and necessary data for accurate search results.