-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected perf results and questions #15
Comments
Working with unicode adds overhead. If you have a use case where you can work with bytes, this is faster; and apparently, this is what is benchmarked in the performance script (which I didn't write). To make the script work across Python 2 and 3 while also having the best performance you should probably use bytes. I don't know if this explains the unexpected results, let me know if you discover more. I don't have time to look into this myself, but if I did I would investigate by profiling. I don't really understand what you are benchmarking, what is "google-re2" and "py-re2" exactly? Why would the performance of Python's re from the standard module differ across these two? Don't know if that's a meaningful difference, that's supposed to be the baseline. |
Sorry if that wasn't clear; |
I see. I didn't know about the google-re2 Python bindings. Perhaps the README could explain the differences. If performance is critical, then you should work with utf-8 encoded bytes strings. This is what RE2 uses internally. If you work with Python unicode strings, there will be encoding and decoding on every pyre2 call. RE2 actually fully supports unicode, even when you pass utf-8 encoded bytes strings. If your fork contains any useful improvements, you're welcome to submit a pull request. |
Since I need a good re2 python interface (and there appears to be many, largely unmaintained) I ended up testing this one and the google one using your performance.py script and the results are somewhat unexpected compared to the performance table in the README https://github.com/andreasvc/pyre2#performance
I made a small change to make it run with newer python
return _wikidata.decode('utf8')
which is maybe why the results look odd; can you verify whether this is correct or not?re2-perf-data.txt
The text was updated successfully, but these errors were encountered: