Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: promote nosql query #14070

Merged
merged 45 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
60dc1af
Python: prepare to promote NoSqlInjection
yoff Aug 15, 2023
55707d3
Python: Make things compile in their new location
yoff Aug 15, 2023
db04597
Python: rename file
yoff Aug 23, 2023
087961d
Python: Refactor to allow customizations
yoff Aug 23, 2023
19046ea
Python: more renames
yoff Aug 23, 2023
bf8bfd9
Python: Add inline query test
yoff Sep 7, 2023
114984b
Python: Added tests based on security analysis
yoff Aug 29, 2023
c0b3245
Python: Enrich the NoSql concept
yoff Aug 31, 2023
7edebbe
Python: Add QLDocs
yoff Sep 6, 2023
f253f97
Python: update test expectations
yoff Sep 6, 2023
970e881
Python: Follow naming convention
yoff Sep 7, 2023
b07d085
Python: make test PoC a proper package
yoff Sep 7, 2023
d91cd21
Python: rename file
yoff Sep 8, 2023
154a369
Python: Add test for function
yoff Sep 11, 2023
d9f63e1
Python: Split modelling of query operators
yoff Sep 11, 2023
a063d7d
Python: sinks -> decodings
yoff Sep 11, 2023
4614b1a
Python: add change note
yoff Sep 18, 2023
5611bda
Python: add test for `$accumulator`
yoff Sep 19, 2023
30c37ca
Python: model `§accumulator`
yoff Sep 19, 2023
7c085ec
Python: Add test for `map_reduce`
yoff Sep 20, 2023
4ec8b3f
Python: Model `map_reduce`
yoff Sep 20, 2023
12dab88
Python: rename concept
yoff Sep 20, 2023
8156fa9
Apply naming suggestions from code review
yoff Sep 28, 2023
37a4f35
Python: further rename
yoff Sep 28, 2023
3fb579e
Python: add test for type tracking
yoff Sep 28, 2023
d90630a
Python: fix query file
yoff Sep 28, 2023
c2b6383
Apply suggestions from code review
yoff Sep 28, 2023
9682c82
Python: rename file
yoff Sep 28, 2023
2a739b3
Python: rename module
yoff Sep 28, 2023
eb1be08
Python: split modelling
yoff Sep 28, 2023
2a7b593
Python: Fix QL alerts
yoff Sep 28, 2023
a8e0023
Python: forgot to list framework
yoff Sep 28, 2023
d5b64c5
Python: update test expectations
yoff Sep 28, 2023
3043633
Python: Some renaming of flow states
yoff Sep 28, 2023
2e028a4
Apply suggestions from code review
yoff Sep 29, 2023
74d6f37
Python: update meta query `TaintSinks`
yoff Sep 29, 2023
2d845e3
Python: nicer paths
yoff Sep 29, 2023
e170805
Python: fix QL alert
yoff Sep 29, 2023
f3a0161
Python: rename flow states
yoff Sep 29, 2023
9769668
Python: require dict sinks be dangerous.
yoff Sep 29, 2023
16e1a00
Python: NoSQLInjection -> NoSqlInjection
RasmusWL Sep 29, 2023
d7ad5a0
Python: List NoSQL injection sinks
RasmusWL Sep 29, 2023
3676262
Python: Clean trailing whitespace
RasmusWL Sep 29, 2023
d6d13f8
Python: -> NoSQL in QLDocs
RasmusWL Sep 29, 2023
9b73bbf
Python: Add keyword argument support
RasmusWL Sep 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 26 additions & 17 deletions python/ql/lib/semmle/python/frameworks/NoSQL.qll
Original file line number Diff line number Diff line change
Expand Up @@ -123,40 +123,49 @@ private module NoSql {
}

/** The `$where` query operator executes a string as JavaScript. */
private class WhereQueryOperator extends API::CallNode, NoSqlQuery::Range {
private class WhereQueryOperator extends DataFlow::Node, Decoding::Range {
API::Node dictionary;
Fixed Show fixed Hide fixed
DataFlow::Node query;

WhereQueryOperator() {
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and
query = this.getParameter(0).getSubscript("$where").asSink()
dictionary =
mongoCollection().getMember(mongoCollectionMethodName()).getACall().getParameter(0) and
query = dictionary.getSubscript("$where").asSink() and
this = dictionary.asSink()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite know what to think about this...

I wasn't quite sure if either the old or the new these solutions could handle when the dictionary is not constructed inside the argument to the call, such as:

search = {'$where': 'this.author === "'+author+'"'}
post = posts.find_one(search) # $ result=BAD

But I tried it out locally, and it did in fact work. I guess they would since the .getParameter(0) takes us into "def node" domain, and we do back-tracking.

So basically you seem to be creating an additional taint-step specifically for NoSQL injection directly from the element of a dictionary with key $where to the argument of a mongo collection method, if API-graphs can track the dictionary flowing to that argument?

I think that works great for simple examples, but if there is any interprocedural steps, it will not be so nice. So I think we need to track these things with data-flow, so it can be part of the path shown to end-users. We might be able to achieve that with a string->dict taint-step directly when constructing a dictionary with a $where key?

You could apply the same logic to the $function and body stuff, so there would be string->dict taint-step when constructing dictionary with body key, and dict->dict taint-step when constructing dict with $function key. That would allow for all of the following examples to give alerts (I think only the first would today)

decoded = json.loads(user_controlled)

search = {
    "body": decoded,
    "args": [ "$author" ],
    "lang": "js"
}
posts.find_one({'$expr': {'$function': search}}) # $ result=BAD

posts.find_one({'$expr': {'$function': decoded}}) # $ result=BAD
posts.find_one({'$expr': decoded}) # $ result=BAD
posts.find_one(decoded) # $ result=BAD

Adding query specific taint-steps that mutate flow-state doesn't play very nicely with using Concepts, so I understand the temptation to just reuse the decoding Concepts, but I would argue that in fact no Decoding is taking place here. What I've done previously is to add query specific logic to the library modeling file, which is certainly something we could also do here (example from flask, originally added in #6989)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a string->dict taint-step directly when constructing a dictionary with a $where key

This is basically what we are doing. We just require that the dictionary also flows to a harmful place.
My intuition is that type tracking should basically be an over-approximation to data flow, so this extra requirements is more like a cheap filter.

Anyway, I added the above tests, and they all pass out of the box...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I added the above tests, and they all pass out of the box...

Nice 👍 I guess that comes from the fact that we say after a json.loads, it could be a dict, and our default taint-steps propagate taint through dictionary literals right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we could end up with a very long jumpstep due to the way it's written right now, which I don't think is a good solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is a reasonable objection (especially now that we got better path explanations in general). I made the step target the created dictionary instead of the argument that the dictionary flows to. Dataflow is able to link the two for us.

}

override DataFlow::Node getQuery() { result = query }
override DataFlow::Node getAnInput() { result = query }

override predicate interpretsDict() { none() }
override DataFlow::Node getOutput() { result = this }

override predicate vulnerableToStrings() { any() }
override string getFormat() { result = "NoSQL" }

override predicate mayExecuteInput() { any() }
yoff marked this conversation as resolved.
Show resolved Hide resolved
}

/** The `$function` query operator executes its `body` string as JavaScript. */
private class FunctionQueryOperator extends API::CallNode, NoSqlQuery::Range {
private class FunctionQueryOperator extends DataFlow::Node, Decoding::Range {
API::Node dictionary;
Fixed Show fixed Hide fixed
DataFlow::Node query;

FunctionQueryOperator() {
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and
query =
this.getParameter(0)
.getASubscript*()
.getSubscript("$function")
.getSubscript("body")
.asSink()
dictionary =
mongoCollection()
.getMember(mongoCollectionMethodName())
.getACall()
.getParameter(0)
.getASubscript*() and
query = dictionary.getSubscript("$function").getSubscript("body").asSink() and
this = dictionary.asSink()
}

override DataFlow::Node getQuery() { result = query }
override DataFlow::Node getAnInput() { result = query }

override DataFlow::Node getOutput() { result = this }

override predicate interpretsDict() { none() }
override string getFormat() { result = "NoSQL" }

override predicate vulnerableToStrings() { any() }
override predicate mayExecuteInput() { any() }
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ module NoSqlInjection {

/** A JSON decoding converts a string to a dictionary. */
class JsonDecoding extends Decoding, StringToDictConversion {
JsonDecoding() { this.getFormat() = "JSON" }
JsonDecoding() { this.getFormat() in ["JSON", "NoSQL"] }

override DataFlow::Node getAnInput() { result = Decoding.super.getAnInput() }

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,19 @@ edges
| PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | PoC/server.py:1:26:1:32 | GSSA Variable request |
| PoC/server.py:1:26:1:32 | GSSA Variable request | PoC/server.py:26:21:26:27 | ControlFlowNode for request |
| PoC/server.py:1:26:1:32 | GSSA Variable request | PoC/server.py:42:14:42:20 | ControlFlowNode for request |
| PoC/server.py:1:26:1:32 | GSSA Variable request | PoC/server.py:51:14:51:20 | ControlFlowNode for request |
| PoC/server.py:26:5:26:17 | SSA variable author_string | PoC/server.py:27:25:27:37 | ControlFlowNode for author_string |
| PoC/server.py:26:21:26:27 | ControlFlowNode for request | PoC/server.py:26:5:26:17 | SSA variable author_string |
| PoC/server.py:27:5:27:10 | SSA variable author | PoC/server.py:30:27:30:44 | ControlFlowNode for Dict |
| PoC/server.py:27:14:27:38 | ControlFlowNode for Attribute() | PoC/server.py:27:5:27:10 | SSA variable author |
| PoC/server.py:27:25:27:37 | ControlFlowNode for author_string | PoC/server.py:27:14:27:38 | ControlFlowNode for Attribute() |
| PoC/server.py:42:5:42:10 | SSA variable author | PoC/server.py:46:27:46:68 | ControlFlowNode for Dict |
| PoC/server.py:42:5:42:10 | SSA variable author | PoC/server.py:46:38:46:67 | ControlFlowNode for BinaryExpr |
| PoC/server.py:42:14:42:20 | ControlFlowNode for request | PoC/server.py:42:5:42:10 | SSA variable author |
| PoC/server.py:46:38:46:67 | ControlFlowNode for BinaryExpr | PoC/server.py:46:27:46:68 | ControlFlowNode for Dict |
| PoC/server.py:51:5:51:10 | SSA variable author | PoC/server.py:53:17:53:70 | ControlFlowNode for BinaryExpr |
| PoC/server.py:51:14:51:20 | ControlFlowNode for request | PoC/server.py:51:5:51:10 | SSA variable author |
| PoC/server.py:53:17:53:70 | ControlFlowNode for BinaryExpr | PoC/server.py:60:37:60:57 | ControlFlowNode for Dict |
| PoC/server.py:60:37:60:57 | ControlFlowNode for Dict | PoC/server.py:60:27:60:58 | ControlFlowNode for Dict |
| flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | flask_mongoengine_bad.py:1:26:1:32 | GSSA Variable request |
| flask_mongoengine_bad.py:1:26:1:32 | GSSA Variable request | flask_mongoengine_bad.py:19:21:19:27 | ControlFlowNode for request |
| flask_mongoengine_bad.py:1:26:1:32 | GSSA Variable request | flask_mongoengine_bad.py:26:21:26:27 | ControlFlowNode for request |
Expand Down Expand Up @@ -76,16 +81,16 @@ edges
| pymongo_test.py:13:5:13:15 | SSA variable json_search | pymongo_test.py:15:42:15:62 | ControlFlowNode for Dict |
| pymongo_test.py:13:19:13:43 | ControlFlowNode for Attribute() | pymongo_test.py:13:5:13:15 | SSA variable json_search |
| pymongo_test.py:13:30:13:42 | ControlFlowNode for unsafe_search | pymongo_test.py:13:19:13:43 | ControlFlowNode for Attribute() |
| pymongo_test.py:29:5:29:12 | SSA variable event_id | pymongo_test.py:33:34:33:73 | ControlFlowNode for Dict |
| pymongo_test.py:29:5:29:12 | SSA variable event_id | pymongo_test.py:33:45:33:72 | ControlFlowNode for Fstring |
| pymongo_test.py:29:16:29:51 | ControlFlowNode for Attribute() | pymongo_test.py:29:5:29:12 | SSA variable event_id |
| pymongo_test.py:29:27:29:33 | ControlFlowNode for request | pymongo_test.py:29:27:29:50 | ControlFlowNode for Subscript |
| pymongo_test.py:29:27:29:50 | ControlFlowNode for Subscript | pymongo_test.py:29:16:29:51 | ControlFlowNode for Attribute() |
| pymongo_test.py:39:5:39:12 | SSA variable event_id | pymongo_test.py:43:34:43:73 | ControlFlowNode for Dict |
| pymongo_test.py:33:45:33:72 | ControlFlowNode for Fstring | pymongo_test.py:33:34:33:73 | ControlFlowNode for Dict |
| pymongo_test.py:39:5:39:12 | SSA variable event_id | pymongo_test.py:43:45:43:72 | ControlFlowNode for Fstring |
| pymongo_test.py:39:16:39:51 | ControlFlowNode for Attribute() | pymongo_test.py:39:5:39:12 | SSA variable event_id |
| pymongo_test.py:39:27:39:33 | ControlFlowNode for request | pymongo_test.py:39:27:39:50 | ControlFlowNode for Subscript |
| pymongo_test.py:39:27:39:50 | ControlFlowNode for Subscript | pymongo_test.py:39:16:39:51 | ControlFlowNode for Attribute() |
| pymongo_test.py:43:45:43:72 | ControlFlowNode for Fstring | pymongo_test.py:43:34:43:73 | ControlFlowNode for Dict |
nodes
| PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | semmle.label | ControlFlowNode for ImportMember |
| PoC/server.py:1:26:1:32 | GSSA Variable request | semmle.label | GSSA Variable request |
Expand All @@ -99,6 +104,11 @@ nodes
| PoC/server.py:42:14:42:20 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| PoC/server.py:46:27:46:68 | ControlFlowNode for Dict | semmle.label | ControlFlowNode for Dict |
| PoC/server.py:46:38:46:67 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
| PoC/server.py:51:5:51:10 | SSA variable author | semmle.label | SSA variable author |
| PoC/server.py:51:14:51:20 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| PoC/server.py:53:17:53:70 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
| PoC/server.py:60:27:60:58 | ControlFlowNode for Dict | semmle.label | ControlFlowNode for Dict |
| PoC/server.py:60:37:60:57 | ControlFlowNode for Dict | semmle.label | ControlFlowNode for Dict |
| flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | semmle.label | ControlFlowNode for ImportMember |
| flask_mongoengine_bad.py:1:26:1:32 | GSSA Variable request | semmle.label | GSSA Variable request |
| flask_mongoengine_bad.py:19:5:19:17 | SSA variable unsafe_search | semmle.label | SSA variable unsafe_search |
Expand Down Expand Up @@ -183,7 +193,7 @@ subpaths
#select
| PoC/server.py:30:27:30:44 | ControlFlowNode for Dict | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | PoC/server.py:30:27:30:44 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| PoC/server.py:46:27:46:68 | ControlFlowNode for Dict | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | PoC/server.py:46:27:46:68 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| PoC/server.py:46:38:46:67 | ControlFlowNode for BinaryExpr | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | PoC/server.py:46:38:46:67 | ControlFlowNode for BinaryExpr | This NoSQL query contains an unsanitized $@. | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| PoC/server.py:60:27:60:58 | ControlFlowNode for Dict | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | PoC/server.py:60:27:60:58 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | PoC/server.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| flask_mongoengine_bad.py:22:34:22:44 | ControlFlowNode for json_search | flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | flask_mongoengine_bad.py:22:34:22:44 | ControlFlowNode for json_search | This NoSQL query contains an unsanitized $@. | flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| flask_mongoengine_bad.py:30:39:30:59 | ControlFlowNode for Dict | flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | flask_mongoengine_bad.py:30:39:30:59 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | flask_mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| flask_pymongo_bad.py:14:31:14:51 | ControlFlowNode for Dict | flask_pymongo_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | flask_pymongo_bad.py:14:31:14:51 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | flask_pymongo_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
Expand All @@ -195,6 +205,4 @@ subpaths
| mongoengine_bad.py:61:29:61:49 | ControlFlowNode for Dict | mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | mongoengine_bad.py:61:29:61:49 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | mongoengine_bad.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| pymongo_test.py:15:42:15:62 | ControlFlowNode for Dict | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | pymongo_test.py:15:42:15:62 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| pymongo_test.py:33:34:33:73 | ControlFlowNode for Dict | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | pymongo_test.py:33:34:33:73 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| pymongo_test.py:33:45:33:72 | ControlFlowNode for Fstring | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | pymongo_test.py:33:45:33:72 | ControlFlowNode for Fstring | This NoSQL query contains an unsanitized $@. | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| pymongo_test.py:43:34:43:73 | ControlFlowNode for Dict | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | pymongo_test.py:43:34:43:73 | ControlFlowNode for Dict | This NoSQL query contains an unsanitized $@. | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
| pymongo_test.py:43:45:43:72 | ControlFlowNode for Fstring | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | pymongo_test.py:43:45:43:72 | ControlFlowNode for Fstring | This NoSQL query contains an unsanitized $@. | pymongo_test.py:1:26:1:32 | ControlFlowNode for ImportMember | user-provided value |
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def by_where():
def by_function():
author = request.args['author']
search = {
"body": 'function(author) { return(author === "'+author+'") }', # $ result=BAD
"body": 'function(author) { return(author === "'+author+'") }',
"args": [ "$author" ],
"lang": "js"
}
Expand All @@ -64,11 +64,11 @@ def by_function():
def by_function_arg():
author = request.args['author']
search = {
"body": 'function(author, target) { return(author === target) }', # $ result=OK
"body": 'function(author, target) { return(author === target) }',
"args": [ "$author", author ],
"lang": "js"
}
post = posts.find_one({'$expr': {'$function': search}}) # $ SPURIOUS: result=BAD
post = posts.find_one({'$expr': {'$function': search}}) # $ result=OK
return show_post(post, author)

@app.route('/', methods=['GET'])
Expand Down