Skip to content

<regex>: Use std::search() in skip heuristic #5586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

muellerj2
Copy link
Contributor

Towards #5468. This replaces the weird usage of _Cmp_chrange() by a straightforward call to std::search() in _Matcher2::_Skip().

I fix a copy-paste mistake in the comment describing _Cmp_icase_translateleft as well.

I also made an attempt to replace _Cmp_chrange()'s implementation by a straightforward call to std::mismatch(), but that seems to be a pessimization in practice of about 10 % (probably because the strings tend to be quite short).

There will still be one follow-up to make an obvious improvement to the skip heuristic for regex and wregex in collate mode. But otherwise, I think this is basically it for simple improvements to the skip heuristic. There are still a few opportunities that could lead to some improvement -- handling several branches, avoiding to walk the NFA for each _Skip() call, or avoiding to compare the NFA nodes matched by _Skip() in _Match_pat() again -- but they are not straightforward to implement.

Benchmark

benchmark before after speedup
bm_lorem_search/"^bibe"/2 28.2506 28.8783 0.98
bm_lorem_search/"^bibe"/3 27.6228 29.82 0.93
bm_lorem_search/"^bibe"/4 28.8783 32.4707 0.89
bm_lorem_search/"bibe"/2 43492.7 2622.77 16.58
bm_lorem_search/"bibe"/3 90680.8 5000 18.14
bm_lorem_search/"bibe"/4 172631 9626.07 17.93
bm_lorem_search/"(bibe)"/2 47538.5 4296.88 11.06
bm_lorem_search/"(bibe)"/3 92071.8 8370.5 11.00
bm_lorem_search/"(bibe)"/4 181370 15485.6 11.71
bm_lorem_search/"(bibe)+"/2 64062.5 10253.9 6.25
bm_lorem_search/"(bibe)+"/3 153460 20856.3 7.36
bm_lorem_search/"(bibe)+"/4 249062 40108.8 6.21
bm_lorem_search/"(?:bibe)+"/2 49178 4603.8 10.68
bm_lorem_search/"(?:bibe)+"/3 94164.3 8998.29 10.46
bm_lorem_search/"(?:bibe)+"/4 188354 17578.3 10.72
bm_lorem_search/R"(\bbibe)"/2 96256.9 89979.2 1.07
bm_lorem_search/R"(\bbibe)"/3 194972 188354 1.04
bm_lorem_search/R"(\bbibe)"/4 374930 368369 1.02
bm_lorem_search/R"(\Bibe)"/2 235395 222178 1.06
bm_lorem_search/R"(\Bibe)"/3 404531 461498 0.88
bm_lorem_search/R"(\Bibe)"/4 941265 983099 0.96
bm_lorem_search/R"((?=....)bibe)"/2 48131.7 3138.95 15.33
bm_lorem_search/R"((?=....)bibe)"/3 96256.9 6277.9 15.33
bm_lorem_search/R"((?=....)bibe)"/4 179983 12207 14.74
bm_lorem_search/R"((?=bibe)....)"/2 44327.8 2915.74 15.20
bm_lorem_search/R"((?=bibe)....)"/3 87886.7 5580.36 15.75
bm_lorem_search/R"((?=bibe)....)"/4 179983 10986.3 16.38
bm_lorem_search/R"((?!lorem)bibe)"/2 45515.6 2999.44 15.17
bm_lorem_search/R"((?!lorem)bibe)"/3 92071.8 5859.38 15.71
bm_lorem_search/R"((?!lorem)bibe)"/4 188354 11160.7 16.88

Note that this means that all improvements since #5509 have sped up searching for the regex (bibe)+ by a factor of about 450 and for (?:bibe)+ by a factor of about 1000 in this benchmark.

@muellerj2 muellerj2 requested a review from a team as a code owner June 14, 2025 14:01
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 14, 2025
@StephanTLavavej StephanTLavavej self-assigned this Jun 14, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Jun 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regex meow is a substring of homeowner
Projects
Status: Initial Review
Development

Successfully merging this pull request may close these issues.

2 participants