Skip to content

Rithvik119am/data-bridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language SQL Query Tool

Screenshot of Natural Language SQL Query Tool in action

Overview

This application allows you to interact with your SQL databases using natural language. Instead of writing complex SQL queries, you can simply ask questions in plain English, and the application will:

Here's what you can do:

  • Ask in Plain English: Effortlessly query your database using natural language – no SQL expertise required!
  • Automatic SQL Generation: The application intelligently translates your questions into optimized SQL queries behind the scenes.
  • Instant Results: View query results instantly within a user-friendly chat interface, making data interaction feel like a conversation.
  • Download & Share: Easily download your data as Excel files (.xlsx) for further analysis, reporting, and sharing with colleagues.
  • Visualize Insights: Create compelling Plotly charts directly from your query results to uncover trends and patterns in your data.
  • Context-Aware AI: Upload documents to enrich the AI's understanding, leading to even more accurate and relevant answers to your questions.

This tool demystifies data access by using powerful AI models to seamlessly connect your natural language questions to your structured database. Now, anyone can access and analyze data, regardless of their SQL skills!

Project Architecture

The system is designed to provide an intelligent natural language interface for querying SQL databases, generating structured responses, and visualizing data insights. It consists of four core components:

  1. Vector Embedding & Question Enhancement
  2. SQL Query Generation Agent
  3. Response Generator
  4. Visualization Generator

Workflow Overview

  1. Initialization & Setup:

    • The user provides database connection details and integrates a Language Learning Model (LLM).
    • The system supports document uploads or connects to an existing vector database for historical chat embeddings.
  2. Vector Retrieval & Question Enhancement:

    • When a user submits a query, the system retrieves the top five most relevant vector embeddings from the database (if available).
    • These embeddings, along with the user’s query, chat history, and the names of tables in the SQL database, are sent to the Question Enhancer model.
    • The Question Enhancer refines the user’s query and suggests relevant visualizations in the following structured format:
    {
      "enhanced_question": "How many employees are there in each role, and what are the specific counts for each?",
      "graph": {
        "Graph Type": "Bar chart",
        "Data Visualization": "Use a bar chart to display the number of employees for each role. The x-axis should represent the different roles, and the y-axis should represent the number of employees. This will allow for easy comparison of the number of employees in each role."
      },
      "useful_tables": [
        "employees"
      ]
    }
  3. SQL Query Generation & Execution:

    • The enhanced question, schema of the relevant tables, and a sample of five rows per table are provided as input to the SQL Query Generation Agent.
    • The SQL agent generates an appropriate query to retrieve the necessary data.
    • If the query encounters errors or returns an empty result, the error is appended to the input and reprocessed, with up to five iterations allowed to refine the query for complex questions.
    • The user can down load this Data from the SQL Query by clicking on Download button
    • The SQL agent is optimized not to directly answer the query but to retrieve the most relevant data to support an accurate response.
  4. Response Generation:

    • The executed SQL query, database output, and the enhanced question are passed to the Response Agent.
    • The agent generates a detailed, structured response to the user’s question based on the retrieved data.
  5. Visualization Generation (Optional):

    • If the user opts for visualization, the Visualization Generator takes the following as input:
      • The first five rows of the SQL output
      • Data types of the retrieved columns
      • The user’s original question
    • The model generates Python code using Matplotlib to visualize the data in the suggested format.
    • The visualization is then displayed in the user interface.

This architecture ensures an iterative, error-resilient approach to SQL querying while offering enhanced user interaction, structured responses, and intuitive data visualizations.

Features

  • Natural Language to SQL: Ask questions about your database in plain English.
  • Database Connectivity: Supports various SQL database types including PostgreSQL, MySQL, SQLite, Oracle, and MSSQL.
  • Chat Interface: Interactive chat interface for seamless conversation with your database.
  • Data Download: Download query results as Excel (.xlsx) files.
  • Data Visualization: Generate Plotly charts directly from your query results.
  • Document Upload (ChromaDB Integration): Upload documents (text, PDF, Word) to provide context to the AI model, enhancing query understanding and accuracy.
  • Credential Encryption: Securely stores database connection credentials using encryption.
  • Clear Chat History: Option to clear the chat history for a fresh start.
  • Toggle Sidebar: Hide or show the database connection sidebar for a cleaner interface.

Getting Started

This guide outlines how to set up your development environment using a virtual environment for this project. Using a virtual environment is highly recommended to isolate project dependencies and avoid conflicts with other Python projects on your system.

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Setup with Virtual Environment

  1. Clone the repository:

    git clone https://github.com/Rithvik119am/data-bridge.git
    cd data-bridge
  2. Create a virtual environment: Navigate to the data-bridge directory in your terminal and create a virtual environment. We'll name it venv (a common convention).

    python -m venv venv
  3. Activate the virtual environment: You need to activate the virtual environment to use it. The activation command depends on your operating system:

    • Linux/macOS:
      source venv/bin/activate
    • Windows:
      venv\Scripts\activate
      (You might need to use .\venv\Scripts\activate in PowerShell)

    (You will know the virtual environment is active when you see (venv) at the beginning of your terminal prompt.)

  4. Install required Python libraries within the virtual environment: Now that your virtual environment is active, use pip to install the project dependencies from the requirements.txt file.

    pip install -r requirements.txt
  5. Obtain an Google API Key:

To deactivate the virtual environment when you are finished working on the project, simply run:

deactivate

Configuration

Screenshot of Natural Language SQL Query Tool in action

  1. API Key:

    • When you run the application for the first time, you will need to enter your Google API key in the sidebar.
    • This API key is used to authenticate with Google's AI services.
  2. Database Credentials:

    • In the sidebar, you need to provide the connection details for your SQL database:
      • Database Type: Select your database type (PostgreSQL, MySQL, SQLite, Oracle, MSSQL).
      • Host: Database host address (e.g., localhost, IP address).
      • Database Name: Name of your database.
      • User: Database username.
      • Password: Database password.
      • Port: Database port number (default ports are usually pre-filled).
  3. db_credentials.json and secret.key:

    • Upon successful connection to the database, the application will create two files:
      • db_credentials.json: Stores your encrypted database connection credentials.
      • secret.key: Encryption key used to secure db_credentials.json.
    • Do not share secret.key. Keep it secure.
    • You can load saved credentials later using the "Load Credentials" button in the sidebar.
  4. chroma/chroma.sqlite3:

    • If you upload documents, the application uses ChromaDB to store and index them for context retrieval.
    • The ChromaDB database file chroma.sqlite3 will be created in the chroma directory in your project folder.

How to Use

  1. Run the application:

    python main.py

    Important Note: The first time you run this project, it may take a significantly longer time to start. This is because the application automatically downloads and installs an embedding model specifically for the ChromaDB integration. This process is necessary for efficient document indexing and contextual query understanding and only occurs once. Subsequent runs will be much faster.

  2. Connect to your Database:

    • Enter your database credentials in the sidebar on the left.
    • Click the "Connect" button.
    • Check the status label below the "Connect" button for connection success or failure messages.
  3. Start Chatting:

    • Once connected, type your natural language questions in the text input field at the bottom of the chat window and press Enter.
    • The application will display your question and then the AI-generated SQL query and the response.
  4. Download Data:

    • For responses that return data, a "Download" button will appear next to the AI's message when you hover over the message area.
    • Click "Download" to save the data as an Excel file (.xlsx).
  5. Visualize Data:

    • For responses with data, a "Visualize" button will also appear upon hovering.
    • Click "Visualize" to generate a Plotly chart based on the data and your question.
  6. Upload Document (Context Enrichment):

    • Click the "Upload Document" button in the sidebar.
    • Select a document file (.txt, .pdf, .docx).
    • Upload Documents having some questions and answers related to the Database
    • Uploaded documents are processed and indexed by ChromaDB. The content of these documents will be considered by the AI when answering your subsequent questions, providing richer context and potentially more accurate results.
  7. Clear Document Context (Delete DB Button):

    • If you have uploaded documents and want to clear the ChromaDB and its associated data. This will remove the sql_data collection from ChromaDB.
  8. Clear Chat History:

    • Click the "Clear Chat" button in the sidebar to erase the current chat conversation.
  9. Toggle Sidebar:

    • Use the "Open"/"Close" button at the top left to toggle the visibility of the sidebar for a more focused chat experience. Screenshot of Natural Language SQL Query Tool in action

Important Notes

  • API Key Security: Keep your Google API key confidential. Do not hardcode it directly into the script if you are sharing the code.
  • Error Handling: The application includes basic error handling, but you may encounter errors depending on the complexity of your queries, database schema, and API availability. Check the terminal output for error messages.
  • Performance: Response times will depend on the complexity of your queries, database size, and the load on the Google API.

License

This project is licensed under the MIT License

Contact

Rithvik - [email protected]

Feel free to reach out if you have any questions, feedback, or suggestions!

Contributing

I welcome contributions to make this Natural Language SQL Query Tool even better!

If you'd like to contribute:

  1. Fork the repository: Start by forking the repository to your own GitHub account.
  2. Create a branch: Create a new branch for your feature or bug fix.
    git checkout -b feature/your-new-feature
  3. Make your changes: Implement your changes, ensuring your code adheres to the project's style and guidelines.
  4. Commit your changes: Commit your changes with clear and concise commit messages.
    git commit -m "Add your feature or fix: descriptive message"
  5. Push to your branch: Push your branch to your forked repository.
    git push origin feature/your-new-feature
  6. Submit a Pull Request (PR): Open a pull request from your branch to the main branch of the original repository. Clearly describe your changes and why they are valuable.

We will review your pull request and appreciate your contributions!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages