-
Notifications
You must be signed in to change notification settings - Fork 502
Escape user-provided text passed to regex #571
base: master
Are you sure you want to change the base?
Conversation
kj21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works beautifully
837c2f7 to
422a785
Compare
|
Took the liberty to fix the pycodestyle errors shown in travis. It was just line length. That whole matching section gives me shivers, though. Every if seems written in a different style and I don't get why we have to |
|
We still have test failures with these patches. Going to take a look. |
|
|
diff --git a/src/documents/models.py b/src/documents/models.py
index 7d12f91..f368f31 100644
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -98,15 +98,14 @@ class MatchingModel(models.Model):
if self.matching_algorithm == self.MATCH_ALL:
for word in self._split_match():
search_result = re.search(
- r"\b{}\b".format(re.escape(word)), text, **search_kwargs)
+ r"\b{}\b".format(word), text, **search_kwargs)
if not search_result:
return False
return True
if self.matching_algorithm == self.MATCH_ANY:
for word in self._split_match():
- if re.search(r"\b{}\b".format(re.escape(word)), text,
- **search_kwargs):
+ if re.search(r"\b{}\b".format(word), text, **search_kwargs):
return True
return False
@@ -142,7 +141,7 @@ class MatchingModel(models.Model):
findterms = re.compile(r'"([^"]+)"|(\S+)').findall
normspace = re.compile(r"\s+").sub
return [
- normspace(" ", (t[0] or t[1]).strip()).replace(" ", r"\s+")
+ re.escape(normspace(" ", (t[0] or t[1]).strip())).replace(r"\ ", r"\s+")
for t in findterms(self.match)
]
This makes it work but is pretty much unreadable. |
|
I'd prefer something like this. Opinions? findterms = re.compile(r'"([^"]+)"|(\S+)').findall
normspace = re.compile(r"\s+").sub
# find all terms and replace multiple spaces with a single one
terms = [normspace(" ", (t[0] or t[1]).strip())
for t in findterms(self.match)]
# escape each term so we don't have regexes where we don't want them.
# This escapes spaces, too.
terms = [re.escape(t).replace(r"\ ", "\s+")
for t in terms]
return terms |
3e6d177 to
989af90
Compare
Rather than using the user/document-provided values directly, we instead escape them to use them verbatim. This fixes issue #568.
989af90 to
359e236
Compare
|
@MasterofJOKers sorry I didn't react after your review, thank you very much for analyzing this. I have applied pretty much exactly your solution to the problem, it seems like the simplest fix while still being correct! 👍 |
MasterofJOKers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked the comments in my version though ;)
Rather than using the user/document-provided values directly, we instead escape them to use them verbatim.
This fixes issue 568.
I only checked the
models.pyfile for occurrences of unescaped regex-"injections". There might be more across the project.