-
Notifications
You must be signed in to change notification settings - Fork 1
Implement rfind() for unicode based on Cpython #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
numba/unicode.py
Outdated
@@ -382,6 +382,15 @@ def _find(substr, s): | |||
return -1 | |||
|
|||
|
|||
@njit(_nrt=False) | |||
def _rfind(substr, s): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it is better to put this code into unicode_rfind
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format all rows to < 80 characters
Need to add tests for start/end/substring types and start/end Optional Type
Examples are: https://github.com/numba/numba/blob/master/numba/tests/test_unicode.py#L447-L492
And https://github.com/numba/numba/blob/master/numba/tests/test_unicode.py#L843-L905
numba/tests/test_unicode.py
Outdated
@@ -6,7 +6,7 @@ | |||
from __future__ import print_function | |||
|
|||
import sys | |||
from itertools import permutations | |||
from itertools import permutations, product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use another line for importing product, before this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you like this importing approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already imported in way that I described in the latest merge to master:
https://github.com/numba/numba/blob/master/numba/tests/test_unicode.py#L9-L10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will make the change, but usually importing module parts is doing with comma in one line.
return x.rfind(y, start, end) | ||
|
||
|
||
def rfind_with_start_only_usecase(x, y, start): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def rfind_with_start_only_usecase(x, y, start): | |
def rfind_with_start_usecase(x, y, start): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it make sense?
|
||
|
||
def rfind_with_start_only_usecase(x, y, start): | ||
return x.rfind(y, start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please reorder functions like rfind_usecase -> rfind_with_start_usecase -> rfind_with_start_end_usecase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I could.
numba/tests/test_unicode.py
Outdated
('_a' + '\U00100304' * 100, ['_a']), | ||
('_\u0102' + '\U00100304' * 100, ['_\u0102']), | ||
] | ||
for s, subs in subs + cpython_subs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that subs does not conflict with subs from 356 line?
Suggest to rename in something else (sub or sub_list for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comment and will fix it.
for start, end in product(range(-20, 20), range(-20, 20)): | ||
msg = 'Results of interpreted and compiled "{}".rfind("{}", {}, {}) should be equal' | ||
self.assertEqual(pyfunc(s, sub_str, start, end), cfunc(s, sub_str, start, end), | ||
msg=msg.format(s, sub_str, start, end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed testing with None indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comment and will add that.
numba/unicode.py
Outdated
@overload_method(types.UnicodeType, 'rfind') | ||
def unicode_rfind(s, substr): | ||
if not isinstance(substr, types.UnicodeType): | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not it raise TypeError if the substring is not unicodeType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it should.
numba/unicode.py
Outdated
@@ -531,6 +531,20 @@ def find_impl(a, b): | |||
return find_impl | |||
|
|||
|
|||
@overload_method(types.UnicodeType, 'rfind') | |||
def unicode_rfind(s, substr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed optional start and end args
Also need to check type of these args, like is done for count: https://github.com/numba/numba/blob/master/numba/unicode.py#L903-L909
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do it.
numba/unicode.py
Outdated
return None | ||
|
||
def rfind_impl(s, substr): | ||
# Naive, slow string matching for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
numba/unicode.py
Outdated
if not isinstance(substr, types.UnicodeType): | ||
return None | ||
|
||
def rfind_impl(s, substr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed optional start and end args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the arguments.
numba/tests/test_unicode.py
Outdated
|
||
for s in UNICODE_EXAMPLES: | ||
subs = ['', 'xx', s[:-2], s[3:], s] | ||
for sub_str in subs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, use ['', 'xx', s[:-2], s[3:], s] in loop definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
The PR is continuation of numba#4611.