Skip to content

Commit 7ddc008

Browse files
committed
regex: eval/cut: fix premature local undo and $^R
GH #16197 The main purpose of this commit is to stop premature scope unwinding within eval code in regexes. Aside from backtracking on failure, the scopes of every eval, such as (?{ local $x }), are supposed to accumulate, and are only unwound en-masse at the very end of the pattern match. However, a combination of a sub-pattern call, such as (??{...}) or (?&...), combined with a cut, (?>...), can trigger earlier savestack popping. The direct fix for this, as explained below, is to remove this single line from the EVAL_postponed_A/B: case: REGCP_UNWIND(ST.cp); /* LEAVE in disguise */ However, that line is entwined with code which attempts to preserve the final value of $^R during scope unwinding. Since that code was kind of working around the misplaced REGCP_UNWIND(), it needs ripping out and re-implementing. This has to be done at the same time, so the bulk of this commit is actually concerned with $^R, even though it isn't the subject of this ticket. So this commit doesn't change the behaviour of $^R, but just changes its implementation somewhat. The $^R issue is that every /(?{...})/ causes the equivalent of local $^R = ...; to be executed. During final exit, the savestack gets unwound, and all those local's get undone, leaving $^R with the value it had before the match started. But we promise that after the match, $^R will hold the value of the most recent (?{...}). The code which this commit rips out restored that value in one way; the new code in this commit does it a different way. Basically, almost the last thing S_regmatch() does is a LEAVE_SCOPE(orig_savestack_ix); This commit makes it so that the current value of $^R is copied just before the LEAVE_SCOPE(), and that value is copied back to $^R just after the LEAVE_SCOPE(). For efficiency, we only do the copy if we've actually set $^R. A mechanism is also needed to ensure that the temporary copy doesn't leak if we die during the savestack unwind. This is achieved by holding a pointer to the copy in the aux_eval struct, which gets processed even if we die. Now back to the main purpose of this commit, the premature stack unwind in the presence of a cut with a sub-pattern. This bug has been there since these features were added. It is instructive to look at a somewhat idealised overview of the S_regmatch() function from around 5.6.0 (with some bug fixes from later releases added). This was while the function was still recursive. It looks approximately like: pp_match(...) { I32 ix = PL_savestack_ix; ... do a match ... LEAVE_SCOPE(ix); } S_regmatch(...) { while (scan) { switch (OP(scan)) { case FOO: if (! there's a FOO) return 0; I32 ix = PL_savestack_ix; if (regmatch(...)) /* recursively match rest of pattern */ return 1; LEAVE_SCOPE(ix); return 0; case END: if (doing a (??{...}) ) { I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively match rest of pattern */ LEAVE_SCOPE(ix); return 1; } LEAVE_SCOPE(ix); return 0; } return 1; case EVAL: ... run the code, then, if its a (?{...}) ... I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively run subpattern */ LEAVE_SCOPE(ix) return 1; } LEAVE_SCOPE(ix); return 0; } } Here, the FOO: case represents all the various ops which recurse. In general, they match the next item and then recurse to match the rest of the pattern. Note that they all do a LEAVE_SCOPE() only in the *failure* branch. At the end of a successful match, there is potentially much recursion, and much stuff on the savestack. When the END op is reached, the series of 'return 1's causes all the recursion to unwind, while leaving the savestack untouched. Finally, the caller - such as pp_match() - clears the savestack. In more recent perls the recursion has been removed and the final LEAVE_SCOPE() is done within S_regmatch() itself, but the principle remains the same: no stack freeing is done *during* matching, and instead there's a single big clean up at the end. Once (??{...}) enters the picture, that changes a bit. When the END op associated with the '...' sub-pattern is reached, regmatch() is called recursively to process any pattern after the (??{..}); then on success, while working its way back through the nested regmatch() calls, both the END and the EVAL code do a LEAVE_SCOPE() in the *success* branch. This is anomalous, and those two LEAVE_SCOPE()'s are what this commit removes (although in the current non-recursive regex engine, they are shared by the same piece of code, so only one had to be removed). By removing them, this regularises the behaviour of sub-patterns. I can't think why those LEAVE_SCOPE()s were originally added, and assume it was a thinko. Normally it makes no difference whether the savestack is popped near the end, interleaved with popping all recursive regmatch calls (or equivalently on non-recursive engines popping the regmatch_state stack), or whether the savestack is popped only after all the recursion is exited from. However, it makes a difference in the presence of a cut, (?>...). Here, the final op of the sub-pattern '...' is SUCCEED, which rather than recursing to match anything following the cut block, just returns. The recursion pops back to the SUSPEND op which started the cut, which then continues with the op loop as normal. Thus when about to match the first op following a (?>...), the recursion *within* the cut has been blown away as if it never happened, but the accumulated savestack entries (e.g. from evals within the cut block) are preserved and continue to accumulate. Now, if there is a (??{...}) sub-pattern, or similarly a (?&FOO), within the cut, then at the end of the cut, the recursion is unwound, which includes the stacked EVAL and END recursions, which at this time call LEAVE_SCOPE(), which frees part of the savestack, even though the pattern match hasn't ended yet. That's the bug which this commit fixes. The tests added to pat_re_eval.t check that the scope bug has been fixed. The test added to svleak.t checks that the new $^R copying code doesn't leak.
1 parent 55c5b50 commit 7ddc008

File tree

4 files changed

+114
-33
lines changed

4 files changed

+114
-33
lines changed

regexec.c

Lines changed: 36 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,12 @@ S_regcppush(pTHX_ const regexp *rex, I32 parenfloor, U32 maxopenparen comma_pDEP
320320

321321
/* REGCP_UNWIND(cp): unwind savestack back to marked index,
322322
* but without restoring regcppush()ed data (leave_scope() treats
323-
* SAVEt_REGCONTEXT as a NOOP)
323+
* SAVEt_REGCONTEXT as a NOOP).
324+
*
325+
* Note that the stack is normally only unwound on failure and
326+
* backtracking. Successful matches involve accumulating many savestack
327+
* entries which are all freed in *one go* during the final exit from
328+
* S_regmatch().
324329
*/
325330

326331
#define REGCP_UNWIND(cp) \
@@ -6683,7 +6688,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
66836688
during a successful match */
66846689
U32 lastopen = 0; /* last open we saw */
66856690
bool has_cutgroup = RXp_HAS_CUTGROUP(rex) ? 1 : 0;
6686-
SV* const oreplsv = GvSVn(PL_replgv);
66876691
/* these three flags are set by various ops to signal information to
66886692
* the very next op. They have a useful lifetime of exactly one loop
66896693
* iteration, and are not preserved or restored by state pushes/pops
@@ -6726,9 +6730,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
67266730
DECLARE_AND_GET_RE_DEBUG_FLAGS;
67276731
#endif
67286732

6729-
/* protect against undef(*^R) */
6730-
SAVEFREESV(SvREFCNT_inc_simple_NN(oreplsv));
6731-
67326733
/* shut up 'may be used uninitialized' compiler warnings for dMULTICALL */
67336734
multicall_oldcatch = 0;
67346735
PERL_UNUSED_VAR(multicall_cop);
@@ -8754,18 +8755,6 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog)
87548755
S_set_reg_curpm(aTHX_ rex_sv, reginfo);
87558756
rex = ReANY(rex_sv);
87568757
rexi = RXi_GET(rex);
8757-
{
8758-
/* preserve $^R across LEAVE's. See Bug 121070. */
8759-
SV *save_sv= GvSV(PL_replgv);
8760-
SV *replsv;
8761-
SvREFCNT_inc(save_sv);
8762-
REGCP_UNWIND(ST.cp); /* LEAVE in disguise */
8763-
/* don't move this initialization up */
8764-
replsv = GvSV(PL_replgv);
8765-
sv_setsv(replsv, save_sv);
8766-
SvSETMAGIC(replsv);
8767-
SvREFCNT_dec(save_sv);
8768-
}
87698758
cur_eval = ST.prev_eval;
87708759
cur_curlyx = ST.prev_curlyx;
87718760

@@ -10325,20 +10314,21 @@ NULL
1032510314
DEBUG_EXECUTE_r(Perl_re_printf( aTHX_ "%sMatch successful!%s\n",
1032610315
PL_colors[4], PL_colors[5]));
1032710316

10328-
if (reginfo->info_aux_eval) {
10329-
/* each successfully executed (?{...}) block does the equivalent of
10330-
* local $^R = do {...}
10331-
* When popping the save stack, all these locals would be undone;
10332-
* bypass this by setting the outermost saved $^R to the latest
10333-
* value */
10334-
/* I don't know if this is needed or works properly now.
10335-
* see code related to PL_replgv elsewhere in this file.
10336-
* Yves
10317+
if (reginfo->info_aux_eval && orig_savestack_ix < PL_savestack_ix) {
10318+
/* After exiting a match, $^R should still hold the value of the
10319+
* latest (?{...}). E.g.
10320+
* /(?{42})/ and print $^R;
10321+
* will print 42. However, each successfully executed (?{...})
10322+
* block does the equivalent of 'local $^R = ...'. In addition,
10323+
* there may be an explicit 'local $^R' or similar code. When
10324+
* popping the savestack at the end, all these locals would be
10325+
* undone. Avoid this issue by making a copy of the final value
10326+
* just *before* the final savestack unwind. After the unwind,
10327+
* we set $^R to that value.
10328+
* This temporary copy is stored in the aux_eval struct so that
10329+
* it will get freed even if we die during savestack unwind.
1033710330
*/
10338-
if (oreplsv != GvSV(PL_replgv)) {
10339-
sv_setsv(oreplsv, GvSV(PL_replgv));
10340-
SvSETMAGIC(oreplsv);
10341-
}
10331+
reginfo->info_aux_eval->final_replsv = newSVsv(GvSV(PL_replgv));
1034210332
}
1034310333
result = 1;
1034410334
goto final_exit;
@@ -10403,13 +10393,25 @@ NULL
1040310393

1040410394
if (last_pushed_cv) {
1040510395
dSP;
10406-
/* see "Some notes about MULTICALL" above */
10396+
/* see "Some notes about MULTICALL" above, especially how
10397+
* the POP_MULTICALL does the equivalent of the LEAVE_SCOPE
10398+
* for us, so no need to do it explicitly. */
1040710399
POP_MULTICALL;
1040810400
PERL_UNUSED_VAR(SP);
1040910401
}
1041010402
else
1041110403
LEAVE_SCOPE(orig_savestack_ix);
1041210404

10405+
if ( reginfo->info_aux_eval
10406+
&& reginfo->info_aux_eval->final_replsv)
10407+
{
10408+
/* '/(?{42})/; print $^R' should print 42; now that the
10409+
* savestack has been popped, set the final value */
10410+
SV *replsv = GvSV(PL_replgv);
10411+
sv_setsv(replsv, reginfo->info_aux_eval->final_replsv);
10412+
SvSETMAGIC(replsv);
10413+
}
10414+
1041310415
assert(!result || locinput - reginfo->strbeg >= 0);
1041410416
return result ? locinput - reginfo->strbeg : -1;
1041510417
}
@@ -11503,6 +11505,8 @@ S_setup_eval_state(pTHX_ regmatch_info *const reginfo)
1150311505
RXp_SUBOFFSET(rex) = 0;
1150411506
RXp_SUBCOFFSET(rex) = 0;
1150511507
RXp_SUBLEN(rex) = reginfo->strend - reginfo->strbeg;
11508+
11509+
eval_state->final_replsv = NULL; /* the final value of $^R. */
1150611510
}
1150711511

1150811512

@@ -11553,6 +11557,7 @@ S_cleanup_regmatch_info_aux(pTHX_ void *arg)
1155311557
PM_SETRE(eval_state->old_op, eval_state->old_op_val);
1155411558
SvREFCNT_dec(old_rx);
1155511559
}
11560+
SvREFCNT_dec(eval_state->final_replsv);
1155611561
}
1155711562

1155811563
PL_regmatch_state = aux->old_regmatch_state;

regexp.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -791,6 +791,7 @@ typedef struct {
791791
SV *sv; /* $_ during (?{}) */
792792
MAGIC *pos_magic; /* pos() magic attached to $_ */
793793
SSize_t pos; /* the original value of pos() in pos_magic */
794+
SV *final_replsv; /* the final value of $^R. */
794795
U8 pos_flags; /* flags to be restored; currently only MGf_BYTES*/
795796
} regmatch_info_aux_eval;
796797

t/op/svleak.t

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ BEGIN {
1515

1616
use Config;
1717

18-
plan tests => 157;
18+
plan tests => 158;
1919

2020
# run some code N times. If the number of SVs at the end of loop N is
2121
# greater than (N-1)*delta at the end of loop 1, we've got a leak
@@ -710,3 +710,31 @@ package myconcat {
710710
'overloaded pattern with code block'
711711
);
712712
}
713+
714+
# When making a final copy of $^R before unwinding the savestack,
715+
# make sure that the copy doesn't leak if we die during that unwinding.
716+
# Dying in STORE triggers that.
717+
718+
package GH16197 {
719+
720+
sub TIESCALAR { bless [ 0 ]; }
721+
sub FETCH { $_[0][0] }
722+
sub STORE { my $v = $_[1];
723+
# die when undoing the 'local'; the previous val was real,
724+
# new value is undef.
725+
if ($_[0][-1] and !$v) {
726+
@{$_[0]} = ();
727+
die;
728+
}
729+
push @{$_[0]}, $v;
730+
}
731+
732+
our $x99;
733+
local $x99;
734+
tie $x99, 'GH16197';
735+
736+
::leak(5, 0,
737+
sub { eval { "" =~ /(?{ local $x99; $x99 = 9 })/; }; },
738+
"no leak in \$^R copy during stack unwind",
739+
);
740+
}

t/re/pat_re_eval.t

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ BEGIN {
2525
our @global;
2626

2727

28-
plan tests => 528; # Update this when adding/deleting tests.
28+
plan tests => 552; # Update this when adding/deleting tests.
2929

3030
run_tests() unless caller;
3131

@@ -1446,6 +1446,53 @@ sub run_tests {
14461446
is($got, "[A-[X-,X-,XY,XY],A-,AB,AB]", "GH22869");
14471447
}
14481448

1449+
# GH #16197
1450+
# A subpattern, i.e. (??{...}) or (?&...), when combined with
1451+
# a cut, (?>...), caused the savestack to be prematurely unwound,
1452+
# resulting in localisations within (?{...}) being undone before
1453+
# the end of the match.
1454+
1455+
{
1456+
my ($match, @vals);
1457+
1458+
# The first six permutations do no backtracking, but are: with or
1459+
# without a cut; and match a digit via either a simple \d,
1460+
# or via a sub-pattern - (??{} or (>&).
1461+
# The next six permutations add some backtracking.
1462+
#
1463+
# Previously the combination of a cut and a subpattern returned
1464+
# "101 101 101 101 101".
1465+
1466+
for my $bt ('', '[a-z]*') { # trigger a backtrack?
1467+
for my $cut ('', '?>') {
1468+
for my $subpat (q{\d}, q{(??{ '\d' })}, q{(?&DIGIT)} ) {
1469+
1470+
my $desc = "bt=$bt cut=$cut subp=$subpat";
1471+
@vals = ();
1472+
local our $x99 = 100;
1473+
1474+
use re 'eval';
1475+
$match =
1476+
"a1b2c3d4e5" =~
1477+
/^ (
1478+
[a-z]
1479+
($cut
1480+
$subpat
1481+
(?{ local $x99 = $x99 + 1; push @vals, $x99 })
1482+
)
1483+
$bt
1484+
){5}
1485+
1486+
(?(DEFINE) (?<DIGIT> \d ) )
1487+
/x;
1488+
ok($match, "GH 16197: match; $desc");
1489+
is("@vals", "101 102 103 104 105",
1490+
"GH 16197: local vals; $desc");
1491+
}
1492+
}
1493+
}
1494+
}
1495+
14491496
} # End of sub run_tests
14501497

14511498
1;

0 commit comments

Comments
 (0)