fix(sql_query): validate if the query is not malicious #1568

ArslanSaleem · 2025-01-30T18:16:59Z

Important

Add SQL query validation in SQLDatasetLoader to prevent execution of potentially malicious queries using is_sql_query_safe.

Security:
- Add is_sql_query_safe function in sql_sanitizer.py to validate SQL queries against a list of unsafe keywords and ensure the main query is SELECT.
- Update SQLDatasetLoader.execute_query in sql_loader.py to use is_sql_query_safe and raise MaliciousQueryError for unsafe queries.
Testing:
- Add tests in test_sql_loader.py for SQLDatasetLoader to verify behavior with safe and malicious queries.
- Add tests in test_sql_sanitizer.py to validate is_sql_query_safe function with various SQL queries.

^{This description was created by}^{for 023187f. It will automatically update as commits are pushed.}

codecov · 2025-01-30T18:18:57Z

Codecov Report

Attention: Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.73%. Comparing base (847757a) to head (023187f).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pandasai/helpers/sql_sanitizer.py	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1568      +/-   ##
==========================================
+ Coverage   82.64%   82.73%   +0.09%     
==========================================
  Files          66       66              
  Lines        2391     2410      +19     
==========================================
+ Hits         1976     1994      +18     
- Misses        415      416       +1

Flag	Coverage Δ
unittests	`82.73% <95.00%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 8d50ee3 in 1 minute and 54 seconds

More details

Looked at 193 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. tests/unit_tests/helpers/test_sql_sanitizer.py:55

Draft comment:
The test case test_safe_select_with_comment incorrectly expects the query to be unsafe. Comments in SQL are not inherently dangerous and should not cause a query to be marked as unsafe. Adjust the test expectation to assert is_sql_query_safe(query). This issue is also present in test_safe_select_with_inline_comment.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests changing the test expectations, claiming SQL comments are safe. However, the code and tests clearly show that blocking comments is an intentional security feature. SQL comments can be used to hide malicious code or for SQL injection attacks. The current implementation considers comments unsafe by design.
Maybe there's a valid use case for allowing certain types of comments in SQL queries in some contexts. The comment author might have experience with systems where SQL comments are permitted.
While comments might be safe in some contexts, the current implementation explicitly treats them as unsafe as a security measure. The tests correctly reflect the intended behavior.
Delete the comment because it contradicts the intentional security design of blocking SQL comments, which is a valid security measure against SQL injection attacks.

2. tests/unit_tests/helpers/test_sql_sanitizer.py:63

Draft comment:
The comment in test_unsafe_query_with_subquery is misleading. The query is actually safe, and the test should expect assert is_sql_query_safe(query) to be True. Adjust the comment to reflect that there are no dangerous keywords in the main or subquery.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. pandasai/helpers/sql_sanitizer.py:65

Draft comment:
The logic here is incorrect. The function should return True for safe queries and False for unsafe ones. The condition should be if parsed.key != "SELECT": to ensure only SELECT queries are considered safe.
Reason this comment was not posted:
Marked as duplicate.

4. tests/unit_tests/helpers/test_sql_sanitizer.py:63

Draft comment:
The assertion here is incorrect. The comment suggests that the query should be unsafe, but the assertion checks for safety. Update the assertion to assert not is_sql_query_safe(query) to match the comment.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_XaZsF1TPw6H2t2kX

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/helpers/sql_sanitizer.py

pandasai/data_loader/loader.py

ellipsis-dev

👍 Looks good to me! Incremental review on 2d4cae7 in 45 seconds

More details

Looked at 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. pandasai/helpers/sql_sanitizer.py:65

Draft comment:
The check for the main query being "SELECT" is too restrictive. Consider allowing "WITH" clauses as they are often used with SELECT queries.

        if parsed.key.upper() not in ["SELECT", "WITH"]:

Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The file is a security-focused SQL sanitizer. WITH clauses are common table expressions that are read-only and can't modify data. However, the code's intent seems to be maximum security by only allowing pure SELECT statements. The change from checking if key == "SELECT" to key.upper() != "SELECT" suggests this restriction was intentional. In a security context, being more restrictive is often the safer choice.
WITH clauses are indeed safe for read-only operations and blocking them might be overly restrictive for legitimate use cases. The comment raises a valid technical point.
However, this is a security-focused sanitizer where being overly restrictive is better than being too permissive. The change appears intentional to increase security.
Delete the comment. While technically correct, the restriction to SELECT-only appears to be an intentional security decision, and we shouldn't second-guess security restrictions without strong justification.

2. pandasai/helpers/sql_sanitizer.py:66

Draft comment:
The error message should be more descriptive. Consider using: "The main query must be a SELECT statement."
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_vGj0Ohvhxjd4yLhu

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on 9b50ea0 in 33 seconds

More details

Looked at 91 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. pandasai/data_loader/sql_loader.py:43

Draft comment:
Consider providing more details in the error message to help users understand why the query is considered unsafe. For example, include which part of the query triggered the error.
Reason this comment was not posted:
Confidence changes required: 50%
The PR introduces a security check for SQL queries, but the error message could be more informative.

Workflow ID: wflow_9ajIftmig4KnxG6f

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

pandasai/data_loader/sql_loader.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ellipsis-dev

👍 Looks good to me! Incremental review on 023187f in 28 seconds

More details

Looked at 27 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. tests/unit_tests/data_loader/test_sql_loader.py:140

Draft comment:
The test description should reflect that it is testing the detection of malicious queries and raising an exception, not just handling queries correctly.
Reason this comment was not posted:
Confidence changes required: 50%
The test for malicious queries is correctly implemented, but the test description is misleading. It should reflect the purpose of the test, which is to ensure that malicious queries are detected and raise an exception.

Workflow ID: wflow_3Uimja5O6pLi8P5F

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

fix(sql_query): validate if the query is not malicious

8d50ee3

ArslanSaleem requested a review from gventuri January 30, 2025 18:16

ellipsis-dev bot reviewed Jan 30, 2025

View reviewed changes

pandasai/helpers/sql_sanitizer.py Outdated Show resolved Hide resolved

gventuri reviewed Jan 30, 2025

View reviewed changes

pandasai/data_loader/loader.py Outdated Show resolved Hide resolved

fix(sql_sanitzer): fix condition of sql

2d4cae7

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

ArslanSaleem added 2 commits January 31, 2025 09:52

Merge branch 'main' into fix/sql_sanitize

554b320

feat(sql_sanitize): integrate sql_sanitize in new loader and test cases

9b50ea0

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

pandasai/data_loader/sql_loader.py Outdated Show resolved Hide resolved

gventuri and others added 3 commits January 31, 2025 11:54

chore: improve error message

d5f47fe

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Merge branch 'main' into fix/sql_sanitize

4e932ca

mock connector function

023187f

ellipsis-dev bot reviewed Jan 31, 2025

View reviewed changes

gventuri merged commit 585c07d into main Jan 31, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sql_query): validate if the query is not malicious #1568

fix(sql_query): validate if the query is not malicious #1568

ArslanSaleem commented Jan 30, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

fix(sql_query): validate if the query is not malicious #1568

fix(sql_query): validate if the query is not malicious #1568

Conversation

ArslanSaleem commented Jan 30, 2025 • edited by ellipsis-dev bot Loading

codecov bot commented Jan 30, 2025 • edited Loading

Codecov Report

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ArslanSaleem commented Jan 30, 2025 •

edited by ellipsis-dev bot

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading