-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql_query): validate if the query is not malicious #1568
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1568 +/- ##
==========================================
+ Coverage 82.64% 82.73% +0.09%
==========================================
Files 66 66
Lines 2391 2410 +19
==========================================
+ Hits 1976 1994 +18
- Misses 415 416 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 8d50ee3 in 1 minute and 54 seconds
More details
- Looked at
193
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. tests/unit_tests/helpers/test_sql_sanitizer.py:55
- Draft comment:
The test casetest_safe_select_with_comment
incorrectly expects the query to be unsafe. Comments in SQL are not inherently dangerous and should not cause a query to be marked as unsafe. Adjust the test expectation toassert is_sql_query_safe(query)
. This issue is also present intest_safe_select_with_inline_comment
. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests changing the test expectations, claiming SQL comments are safe. However, the code and tests clearly show that blocking comments is an intentional security feature. SQL comments can be used to hide malicious code or for SQL injection attacks. The current implementation considers comments unsafe by design.
Maybe there's a valid use case for allowing certain types of comments in SQL queries in some contexts. The comment author might have experience with systems where SQL comments are permitted.
While comments might be safe in some contexts, the current implementation explicitly treats them as unsafe as a security measure. The tests correctly reflect the intended behavior.
Delete the comment because it contradicts the intentional security design of blocking SQL comments, which is a valid security measure against SQL injection attacks.
2. tests/unit_tests/helpers/test_sql_sanitizer.py:63
- Draft comment:
The comment intest_unsafe_query_with_subquery
is misleading. The query is actually safe, and the test should expectassert is_sql_query_safe(query)
to beTrue
. Adjust the comment to reflect that there are no dangerous keywords in the main or subquery. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. pandasai/helpers/sql_sanitizer.py:65
- Draft comment:
The logic here is incorrect. The function should returnTrue
for safe queries andFalse
for unsafe ones. The condition should beif parsed.key != "SELECT":
to ensure only SELECT queries are considered safe. - Reason this comment was not posted:
Marked as duplicate.
4. tests/unit_tests/helpers/test_sql_sanitizer.py:63
- Draft comment:
The assertion here is incorrect. The comment suggests that the query should be unsafe, but the assertion checks for safety. Update the assertion toassert not is_sql_query_safe(query)
to match the comment. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_XaZsF1TPw6H2t2kX
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 2d4cae7 in 45 seconds
More details
- Looked at
15
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. pandasai/helpers/sql_sanitizer.py:65
- Draft comment:
The check for the main query being "SELECT" is too restrictive. Consider allowing "WITH" clauses as they are often used with SELECT queries.
if parsed.key.upper() not in ["SELECT", "WITH"]:
- Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The file is a security-focused SQL sanitizer. WITH clauses are common table expressions that are read-only and can't modify data. However, the code's intent seems to be maximum security by only allowing pure SELECT statements. The change from checking if key == "SELECT" to key.upper() != "SELECT" suggests this restriction was intentional. In a security context, being more restrictive is often the safer choice.
WITH clauses are indeed safe for read-only operations and blocking them might be overly restrictive for legitimate use cases. The comment raises a valid technical point.
However, this is a security-focused sanitizer where being overly restrictive is better than being too permissive. The change appears intentional to increase security.
Delete the comment. While technically correct, the restriction to SELECT-only appears to be an intentional security decision, and we shouldn't second-guess security restrictions without strong justification.
2. pandasai/helpers/sql_sanitizer.py:66
- Draft comment:
The error message should be more descriptive. Consider using:"The main query must be a SELECT statement."
- Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_vGj0Ohvhxjd4yLhu
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on 9b50ea0 in 33 seconds
More details
- Looked at
91
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. pandasai/data_loader/sql_loader.py:43
- Draft comment:
Consider providing more details in the error message to help users understand why the query is considered unsafe. For example, include which part of the query triggered the error. - Reason this comment was not posted:
Confidence changes required:50%
The PR introduces a security check for SQL queries, but the error message could be more informative.
Workflow ID: wflow_9ajIftmig4KnxG6f
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 023187f in 28 seconds
More details
- Looked at
27
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. tests/unit_tests/data_loader/test_sql_loader.py:140
- Draft comment:
The test description should reflect that it is testing the detection of malicious queries and raising an exception, not just handling queries correctly. - Reason this comment was not posted:
Confidence changes required:50%
The test for malicious queries is correctly implemented, but the test description is misleading. It should reflect the purpose of the test, which is to ensure that malicious queries are detected and raise an exception.
Workflow ID: wflow_3Uimja5O6pLi8P5F
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Add SQL query validation in
SQLDatasetLoader
to prevent execution of potentially malicious queries usingis_sql_query_safe
.is_sql_query_safe
function insql_sanitizer.py
to validate SQL queries against a list of unsafe keywords and ensure the main query isSELECT
.SQLDatasetLoader.execute_query
insql_loader.py
to useis_sql_query_safe
and raiseMaliciousQueryError
for unsafe queries.test_sql_loader.py
forSQLDatasetLoader
to verify behavior with safe and malicious queries.test_sql_sanitizer.py
to validateis_sql_query_safe
function with various SQL queries.This description was created by for 023187f. It will automatically update as commits are pushed.