Website: https://querymind-ai.streamlit.app/
An intelligent SQL agent that writes, executes, and self-corrects database queries in real-time. Built for data analysts, business users, and anyone who needs to query databases without writing SQL.
Unlike traditional SQL generators, QueryMind reflects on its own output and automatically fixes errors:
- Converts natural language β SQL using Groq's LLaMA 3.3 70B
- Executes the query against SQLite
- Reflects on the actual output data to detect:
- Negative totals from refunds β wraps in
ABS() - Empty results β checks if filtered values actually exist in the database
- Missing schema fields β explains what's unavailable
- Date filters outside actual date range β flags with database statistics
- Duplicate rows β identifies aggregation issues
- Null-only columns β detects missing data
- Incomplete coverage β warns about filtered regions
- Negative totals from refunds β wraps in
- Rewrites & re-executes the corrected query only when there's a real logic error
- Explains the fix in plain English, grounded in actual data
- 4-layer caching system (reflection + semantic + explanation + column values)
- Sub-100ms response time for cached queries
- 10x faster than traditional SQL generation tools
- Smart cache invalidation with TTL-based expiration
- Multi-stage validation (rule-based + LLM semantic checks with actual data)
- Data-aware reflection that validates against real database values and date ranges
- Conservative correction logic that only suggests changes for real errors
- Fallback logic for edge cases
- Comprehensive error handling
- Cache statistics dashboard for monitoring
User Question (Plain English)
β
Generate SQL (Groq LLaMA 3.3 70B) [CACHED]
β
Execute Query v1 [CACHED]
β
Reflection Engine [CACHED]
ββ Data Anomaly Detection (rule-based)
β ββ Empty DataFrames
β ββ Negative values
β ββ Duplicates
β ββ Null columns
β ββ Coverage gaps
ββ Semantic Validation (LLM + Actual Data)
β ββ Analyzes v1 SQL output (first 3 rows)
β ββ Checks filtered values against database
β ββ Validates date ranges against actual min/max
β ββ Intent matching
β ββ Schema field verification
ββ Auto-Correction Logic
ββ ABS() wrapper for negatives
ββ NULL response for unavailable data
ββ Data-grounded explanations
β
Execute Query v2 (corrected) [CACHED]
β
Plain English Explanation
All stages are intelligently cached for instant repeat queries!
Video Demo: https://youtu.be/pjX5gr85adc?si=bju8oNf4UZvLuqSn
SELECT product_name, SUM(revenue)
FROM transactions
GROUP BY product_name
ORDER BY total_revenue DESC LIMIT 1;
-- Result: -$27,668.67 (AirPods Pro)
-- Problem: Negative revenue from refunds skews resultsQueryMind detects the issue:
- Negative numeric values detected in actual output
Auto-corrected SQL:
SELECT product_name, SUM(ABS(revenue))
FROM transactions
GROUP BY product_name
ORDER BY total_revenue DESC LIMIT 1;Result: $145,892.34 (iPhone 15 Pro)
Explanation:
"The reflection detected negative revenue values caused by refunds. Added ABS() to calculate absolute revenue for accurate product ranking."
You ask: "What were sales in New York?"
Generated SQL:
SELECT * FROM transactions WHERE region = 'NY';QueryMind checks actual database:
- Query returns empty
- Available regions:
['North', 'South', 'East', 'West']
QueryMind Response:
No region 'NY' exists in data. Available regions: North, South, East, West.
Why this matters: Prevents hallucinated SQL "fixes" when the real issue is data availability, not query syntax.
You ask: "Show sales from 2023"
QueryMind checks database:
- Query returns empty
- Actual date range:
2025-09-01 to 2025-10-25
QueryMind Response:
No data from 2023. Database only contains data from Sept-Oct 2025.
Why this matters: Stops hallucinated "date format fixes" by validating against actual min/max dates in the database.
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Streamlit | Interactive UI |
| AI Model | Groq LLaMA 3.3 70B Versatile | Natural language β SQL |
| Database | SQLite | Local data storage |
| Caching | Multi-layer in-memory + Streamlit cache | Performance optimization |
| Language | Python 3.11 | Core logic |
# Clone the repository
git clone https://github.com/athulya-anil/QueryMind.git
cd QueryMind
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the Streamlit app
streamlit run Demo.pyAdd Configure API Key:
# .streamlit/secrets.toml
GROQ_API_KEY = "your_key_here"def detect_output_anomalies(df: pd.DataFrame):
# Checks for:
- Empty DataFrames β WHERE/JOIN errors
- Negative values β Refunds or sign errors
- Duplicate rows β Aggregation issues
- Null-only columns β Missing data
- Coverage gaps β Missing regions (< 4 unique)def semantic_reflection(question, sql_query, schema, sample_output):
# LLM analyzes actual SQL output (first 3 rows) to determine:
- Does query match user intent?
- Are referenced fields in schema?
- For empty results:
* Retrieves available values from filtered columns
* Validates date filters against actual min/max dates
* Returns NULL with data-grounded explanation if unavailable
# Returns: {"feedback": "...", "refined_sql": "..."}Fallback Logic: If LLM fails, uses regex-based field detection for terms like:
color, rating, brand, model, size, version
Anti-Hallucination Features:
- Lower temperature (0.3) for more accurate responses
- Validates SQL changes are meaningful, not cosmetic rewrites
- Includes actual output data in LLM prompt (as markdown)
- Retrieves available column values for empty results
- Fetches database date range statistics to validate time filters
Negative totals:
# Detects: SUM(revenue) with negative values in actual output
# Fixes: SUM(ABS(revenue))
fixed_sql = re.sub(r"SUM\(([^)]+)\)", r"SUM(ABS(\1))", sql_query)Invalid queries:
# Returns: "NULL" + helpful explanation with actual data
# Example: "Try using 'product_name', 'category', or 'region'"
# Example: "Available regions: North, South, East, West"
# Example: "Database contains data from Sept-Oct 2025 only"def generate_reflection_explanation(issues, feedback, old_sql, new_sql, sample_output):
# LLM generates 2-3 sentence explanation that:
- References actual output data
- Focuses on WHY the fix improves accuracy
- Cites real database values/dates for empty results
- Uses temperature=0.4 for accuracy- Execute initial query β Get results DataFrame
- Scan for anomalies β Rule-based checks on actual data
- If negatives detected β Auto-apply
ABS()fix - If no anomalies β Run LLM semantic validation with actual output data
- For empty results β Retrieve available values and date ranges from database
- If invalid fields or missing data β Return
NULL+ data-grounded explanation - Generate explanation β LLM summarizes the fix using actual data references
- Re-execute corrected query β Show final results
MIT Β© 2025 Athulya Anil
