<regex>
: Use std::search()
in skip heuristic
#5586
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #5468. This replaces the weird usage of
_Cmp_chrange()
by a straightforward call tostd::search()
in_Matcher2::_Skip()
.I fix a copy-paste mistake in the comment describing
_Cmp_icase_translateleft
as well.I also made an attempt to replace
_Cmp_chrange()
's implementation by a straightforward call tostd::mismatch()
, but that seems to be a pessimization in practice of about 10 % (probably because the strings tend to be quite short).There will still be one follow-up to make an obvious improvement to the skip heuristic for
regex
andwregex
incollate
mode. But otherwise, I think this is basically it for simple improvements to the skip heuristic. There are still a few opportunities that could lead to some improvement -- handling several branches, avoiding to walk the NFA for each_Skip()
call, or avoiding to compare the NFA nodes matched by_Skip()
in_Match_pat()
again -- but they are not straightforward to implement.Benchmark
Note that this means that all improvements since #5509 have sped up searching for the regex
(bibe)+
by a factor of about 450 and for(?:bibe)+
by a factor of about 1000 in this benchmark.