-
Notifications
You must be signed in to change notification settings - Fork 115
Description
I'd be interested in writing checks for this, if it would be acceptable.
Please just tell me / close this issue if these rules would not be welcome.
I often see re usage which would be made better by (1) using re.compile (rather than the free functions which take a string) or (2) moving re.compile from function scope out to module scope.1
Of course, for these changes to be correct, the regex string needs to be constant, but that's the common case.
The regex flags also need to be considered.
Example 1, encourage moving away from the free functions when a pattern can be pre-compiled:
# okay
def foo(x):
return re.search(r"\w+", x)
# better
_WORD_PAT = re.compile(r"\w+")
def foo(x):
return _WORD_PAT.search(x)Example 2, there's no real reason to do it this way:
# bad, why write this? compile out in module scope
def foo(x):
pat = re.compile(r"\w+")
return pat.search(x)(fix is the same as Example 1)
Example 3, can we handle flags? What if they're ORed together?
def foo(x):
pat = re.compile(r"[abc\n]+", flags=re.I | re.M)
return pat.search(x)I therefore suggest two rules, one on by default and one in the opinionated category:
- Opinionated: You should use
re.compile()instead of{re.search, re.match, re.fullmatch, re.split, re.findall, re.finditer, re.sub, re.subn}on a string - Non-opinionated: You should not call
re.compile()on a constant in function scope -- do it in class or module scope
I'd suggest starting with turning these rules on only where flags is omitted from the call, so the 1-argument form of re.compile and the 2-argument forms of the various functions. flags handling can be a later enhancement.
Implementation should be simple. Detecting these names would be similar to B017 -- just look for a node named "re" and don't worry about the name being rebound in funny ways.
Footnotes
-
Yes, it can be okay if the
recache covers your needs. But I've seen this in library code, where you cannot make any assumptions about how many regexes are being compiled and put in the cache. ↩