Skip to content

Commit 403cb97

Browse files
dginevbrucemiller
authored andcommitted
MathGrammar coverage tests (brucemiller#1107)
* first stab at grammar coverage test * add a couple of array cases * exhaust basic factor cases * modifier and poly- expressions * grammar coverage is under parse * some more bigop examples; start grammar terminal-based section * example fuzzing tool, enumerating grammatical productions of MathGrammar * example fuzzing tool, enumerating grammatical productions of MathGrammar * attempt to add a rule-log utility for debugging MathParser parses * grammar maybeRHS rule test * grammar formulae rule tests * grammar term rule test * grammar SignedTerm test * adjust rule log print for less redundancy * grammar AnythingAny test * grammar Factor test * more factor tests * add grammar trigBareArg test * add grammar bareArg test * add grammar expression(s) test * add test grammar extendFormula * add grammar test moreRHS * add grammar test maybeColRHS * add grammar test metarelop Formula * add grammar test maybeEvalAt * TODO remaining grammar tests * add test grammar subscript+superscript * add grammar scriptfactoropen test * add grammar punctExpr test * add grammar preScripted test * add grammar moreTerms2 test * update to new positional numbering in script xml * return to regular grammar state * add grammar tests to manifest * update to latest master * add coverage test for MathGrammar * reorganize to topical tests, make meta-test for grammar coverage optional * update with remaining grammar_coverage leftovers * also consider consumed rules for grammar coverage meta-test * also consider consumed rules for grammar coverage meta-test * also consider consumed rules for grammar coverage meta-test * down to 72 missing rules * updates * better parsing of recdescent trace * ketExpression is subcategorized via Formulae, which matches both individual AddOp and MulOp * down to 39 failing parses * working session with Bruce * patch scanning of recdescent trace * update tests * update core parse test targets to latest updates * better test harness logging * ignore .xdv files * all grammar coverage tests passing
1 parent a11b274 commit 403cb97

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+4129
-132
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,13 @@ Makefile.old
1515
# And of course the whole compiled lib
1616
pm_to_blib
1717
blib/
18-
# Other random programming artifacts
18+
# Other random programming artifacts
1919
TAGS
2020
save/
2121
# Random TeX junk
2222
*.aux
2323
*.log
24+
*.xdv
2425
# These are generated by creating manual.pdf
2526
doc/manual/manual.aux
2627
doc/manual/manual.idx

MANIFEST

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -719,6 +719,7 @@ t/94_runtimes.t
719719
t/95_complex_config.t
720720
t/96_fatal.t
721721
t/97_manifest.t
722+
t/170_grammar_coverage.t
722723
t/alignment/algx.pdf
723724
t/alignment/algx.tex
724725
t/alignment/algx.xml
@@ -1347,24 +1348,57 @@ t/namespace/ns5.latexml
13471348
t/namespace/ns5.pdf
13481349
t/namespace/ns5.tex
13491350
t/namespace/ns5.xml
1351+
t/parse/algebraic_terms.pdf
1352+
t/parse/algebraic_terms.tex
1353+
t/parse/algebraic_terms.xml
1354+
t/parse/array_math.pdf
1355+
t/parse/array_math.tex
1356+
t/parse/array_math.xml
1357+
t/parse/artefacts.pdf
1358+
t/parse/artefacts.tex
1359+
t/parse/artefacts.xml
1360+
t/parse/calculus.pdf
1361+
t/parse/calculus.tex
1362+
t/parse/calculus.xml
13501363
t/parse/compose.pdf
13511364
t/parse/compose.tex
13521365
t/parse/compose.xml
1366+
t/parse/fences.pdf
1367+
t/parse/fences.tex
1368+
t/parse/fences.xml
13531369
t/parse/functions.pdf
13541370
t/parse/functions.tex
13551371
t/parse/functions.xml
1372+
t/parse/function_argument_syntax.pdf
1373+
t/parse/function_argument_syntax.tex
1374+
t/parse/function_argument_syntax.xml
13561375
t/parse/kludge.pdf
13571376
t/parse/kludge.tex
13581377
t/parse/kludge.xml
1378+
t/parse/metarelation_elision.pdf
1379+
t/parse/metarelation_elision.tex
1380+
t/parse/metarelation_elision.xml
13591381
t/parse/mixedfrac.pdf
13601382
t/parse/mixedfrac.tex
13611383
t/parse/mixedfrac.xml
1384+
t/parse/multirelations.pdf
1385+
t/parse/multirelations.tex
1386+
t/parse/multirelations.xml
1387+
t/parse/nested_application.pdf
1388+
t/parse/nested_application.tex
1389+
t/parse/nested_application.xml
13621390
t/parse/operators.pdf
13631391
t/parse/operators.tex
13641392
t/parse/operators.xml
13651393
t/parse/parens.pdf
13661394
t/parse/parens.tex
13671395
t/parse/parens.xml
1396+
t/parse/parser_speculate.pdf
1397+
t/parse/parser_speculate.tex
1398+
t/parse/parser_speculate.xml
1399+
t/parse/prescripted.pdf
1400+
t/parse/prescripted.tex
1401+
t/parse/prescripted.xml
13681402
t/parse/qm.pdf
13691403
t/parse/qm.tex
13701404
t/parse/qm.xml
@@ -1374,12 +1408,24 @@ t/parse/relations.xml
13741408
t/parse/scripts.pdf
13751409
t/parse/scripts.tex
13761410
t/parse/scripts.xml
1411+
t/parse/sequences_and_lists.pdf
1412+
t/parse/sequences_and_lists.tex
1413+
t/parse/sequences_and_lists.xml
13771414
t/parse/sets.pdf
13781415
t/parse/sets.tex
13791416
t/parse/sets.xml
13801417
t/parse/spacing.pdf
13811418
t/parse/spacing.tex
13821419
t/parse/spacing.xml
1420+
t/parse/standalone_equations.pdf
1421+
t/parse/standalone_equations.tex
1422+
t/parse/standalone_equations.xml
1423+
t/parse/standalone_modifiers.pdf
1424+
t/parse/standalone_modifiers.tex
1425+
t/parse/standalone_modifiers.xml
1426+
t/parse/subordinate_lists.pdf
1427+
t/parse/subordinate_lists.tex
1428+
t/parse/subordinate_lists.xml
13831429
t/parse/terms.pdf
13841430
t/parse/terms.tex
13851431
t/parse/terms.xml

lib/LaTeXML/MathGrammar

Lines changed: 28 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@
1515
# perl -MParse::RecDescent - MathGrammar LaTeXML::MathGrammar
1616
# ================================================================================
1717
# Startup actions: import the constructors
18-
{ BEGIN{ use LaTeXML::MathParser qw(:constructors);
18+
{ BEGIN{ use LaTeXML::MathParser qw(:constructors);
1919
#### $::RD_TRACE=1;
2020
}}
21-
21+
2222
# Rules section
2323
# ========================================
2424
# Naming Conventions:
@@ -27,7 +27,7 @@
2727
# Initial lowercase : internal rules.
2828
# ========================================
2929
# For internal rules
30-
# moreFoos[$foo] : Looks for more Foo's w/appropriate punctuation or operators,
30+
# moreFoos[$foo] : Looks for more Foo's w/appropriate punctuation or operators,
3131
# whatever is appropriate, and combines it with whatever was passed in
3232
# as pattern arg. Typically, the last clause would be simply
3333
# | { $arg[0]; }
@@ -65,7 +65,7 @@ Anything : AnythingAny /^\Z/ { $item[1]; }
6565

6666
#======================================================================
6767
AnythingAny :
68-
Formulae
68+
Formulae
6969
| OPEN Formulae CLOSE { Fence($item[1],$item[2],$item[3]); }
7070
| modifierFormulae
7171
| OPEN modifierFormula CLOSE { Fence($item[1],$item[2],$item[3]); }
@@ -79,7 +79,7 @@ AnythingAny :
7979
| FLOATSUPERSCRIPT { NewScript(Absent(),$item[1]); }
8080
| FLOATSUBSCRIPT { NewScript(Absent(),$item[1]); }
8181
| AnyOp Expression { Apply($item[1],Absent(),$item[2]);}
82-
82+
8383
# a top level rule for sub and superscripts that can accept all sorts of junk.
8484
Subscript : <rulevar: local $MaxAbsDepth = $LaTeXML::MathParser::MAX_ABS_DEPTH>
8585
Subscript :
@@ -98,7 +98,7 @@ aSubscript :
9898

9999
aSuperscript :
100100
supops
101-
| Formulae
101+
| Formulae
102102
| AnyOp Expression { Apply($item[1],Absent(),$item[2]);}
103103
| AnyOp
104104

@@ -109,7 +109,7 @@ aSuperscript :
109109
# Expression(s) separated by punctuation, relational operators or metarelational
110110
# operators [Think of $a=b=c$ vs $a=b, c=d$ vs. $a=b,c,d$ .. ]
111111
# and group them into Formulae (collections of relations), including relations
112-
# which have punctuated collections of Expression(s) on either the LHS or RHS,
112+
# which have punctuated collections of Expression(s) on either the LHS or RHS,
113113
# as well as `multirelation' like a = b = c, or simply punctuated collections of
114114
# Expression(s)
115115

@@ -129,7 +129,7 @@ endPunct : PUNCT | PERIOD
129129

130130
Formula : Expression extendFormula[$item[1]]
131131

132-
# extendFormula[$expression] ; expression might be followed by punct Expression...
132+
# extendFormula[$expression] ; expression might be followed by punct Expression...
133133
# or relop Expression... or arrow Expression or nothing.
134134
extendFormula :
135135
/^\Z/ { $arg[0];} # short circuit!
@@ -138,7 +138,7 @@ extendFormula :
138138
| relop /^\Z/ { NewFormula($arg[0],$item[1], Absent()); }
139139
| { $arg[0]; }
140140

141-
# maybeRHS[$expr,(punct,$expr)*];
141+
# maybeRHS[$expr,(punct,$expr)*];
142142
# Could have RELOP Expression (which means the (collected LHS) relation RHS)
143143
# or done (just collection)
144144
maybeRHS :
@@ -270,7 +270,7 @@ ExpressionsNoBars : Expressions
270270
# Abstractly, things combined by operators binding tighter than multiplication
271271
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
272272

273-
Factor :
273+
Factor :
274274
# These 2nd two are Iffy; hopefully the 1st rule will protect from backtrack?
275275
OPEN ARRAY CLOSE addScripts[Fence($item[1],$item[2],$item[3])]
276276
# perhaps only when OPEN or CLOSED is { or } ??
@@ -317,7 +317,7 @@ ATOM_OR_ID : ATOM | ID | ARRAY
317317
# Note f g h => f*g*h, but f g h x => f(g(h(x))) Seems like what people mean...
318318
# Should there be a special case for trigs?
319319
barearg : aBarearg moreBareargs[$item[1]]
320-
aBarearg :
320+
aBarearg :
321321
preScripted['FUNCTION'] addArgs[$item[1]]
322322
| preScripted['OPFUNCTION'] addOpFunArgs[$item[1]]
323323
| preScripted['TRIGFUNCTION'] addTrigFunArgs[$item[1]]
@@ -335,7 +335,7 @@ moreBareargs :
335335

336336
# A variation that does not allow a bare trig function
337337
trigBarearg : aTrigBarearg moreTrigBareargs[$item[1]]
338-
aTrigBarearg :
338+
aTrigBarearg :
339339
preScripted['FUNCTION'] addArgs[$item[1]]
340340
| preScripted['OPFUNCTION'] addOpFunArgs[$item[1]]
341341
| preScripted['ATOM_OR_ID'] maybeArgs[$item[1]]
@@ -348,7 +348,7 @@ moreTrigBareargs :
348348
/^\Z/ { $arg[0];} # short circuit!
349349
| MulOp aTrigBarearg
350350
moreTrigBareargs[ApplyNary($item[1],$arg[0],$item[2])]
351-
| aTrigBarearg
351+
| aTrigBarearg
352352
moreTrigBareargs[ApplyNary(InvisibleTimes(),$arg[0],$item[1])]
353353
| { $arg[0]; }
354354

@@ -397,7 +397,7 @@ maybeBraket :
397397
ketExpression : <rulevar: local $forbidVertBar = 1>
398398
ketExpression : <rulevar: local $forbidLRAngle = 1>
399399
ketExpression : Formulae
400-
| METARELOP | ARROW | AddOp | MulOp | MODIFIEROP
400+
| METARELOP | MODIFIEROP
401401

402402
#======================================================================
403403
# absExpression; need to be careful about misinterpreting the next |
@@ -446,19 +446,19 @@ inpreScripted :
446446
factorOpen :
447447
AddOp balancedClose[$arg[0]] addScripts[Fence($arg[0],$item[1],$item[2])] # For (-)
448448
# Parenthesized Operator possibly w/scripts
449-
| preScripted['bigop'] balancedClose[$arg[0]]
449+
| preScripted['bigop'] balancedClose[$arg[0]]
450450
addScripts[Fence($arg[0],$item[1],$item[2])] Factor
451451
{ Apply($item[3],$item[4]); }
452452
# Parenthesized Operator including a pre-factor
453-
| Factor preScripted['bigop'] balancedClose[$arg[0]]
453+
| Factor preScripted['bigop'] balancedClose[$arg[0]]
454454
addScripts[Fence($arg[0],
455455
Apply(InvisibleTimes(),$item[1],$item[2]),$item[3])] Factor
456456
{ Apply($item[4],$item[5]); }
457457
# read expression too? match subcases.
458458
| Expression factorOpenExpr[$arg[0],$item[1]]
459459
# Empty OPEN CLOSE ?
460460
| balancedClose[$arg[0]] addScripts[Fence($arg[0],$item[1])]
461-
# Sequence starting with an operator ?
461+
# Sequence starting with an operator ?
462462
| AnyOp factorOpenExpr[$arg[0],$item[1]]
463463

464464
# factorOpenExpr[$open,$expr]; Try to recognize various things that start
@@ -491,13 +491,13 @@ FormulaNOBar : Formula
491491

492492
# The "such that" that can appear in a sets like {a "such that" predicate(a)}
493493
# accept vertical bars, and colon
494-
suchThatOp : MIDDLE | VERTBAR
494+
suchThatOp : MIDDLE | VERTBAR
495495
| /METARELOP:colon:\d+/ { Lookup($item[1]); }
496496
# ================================================================================
497497
# Function args, etc.
498498

499499
# maybeArgs[$function] ; Add arguments to an identifier, but only if made explict.
500-
maybeArgs :
500+
maybeArgs :
501501
/^\Z/ { $arg[0];} # short circuit!
502502
| APPLYOP requireArgs[$arg[0]]
503503
| { $arg[0]; }
@@ -512,7 +512,7 @@ doubtArgs :
512512
| { IsNotationAllowed('MaybeFunctions'); } OPEN forbidArgs[$arg[0],$item[2]]
513513
| { $arg[0]; }
514514

515-
# forbidArgs[$unknown,$open]; Got a suspicious pattern: an unknown and open.
515+
# forbidArgs[$unknown,$open]; Got a suspicious pattern: an unknown and open.
516516
# If the following seems to be an argument list, warn.
517517
forbidArgs :
518518
Argument (argPunct Argument)(s) balancedClose[$arg[1]]
@@ -528,7 +528,7 @@ requireArgs :
528528
balancedClose[$item[1]]
529529
{ ApplyDelimited($arg[0],$item[1],$item[2],
530530
map(@$_,@{$item[3]}),$item[4]); }
531-
# Hmm, should only be applicable to _some_ functions ???
531+
# Hmm, should only be applicable to _some_ functions ???
532532
| barearg { Apply($arg[0],$item[1]); }
533533

534534
# addArgs[$function]; We've got a function; Add following arguments to a
@@ -602,7 +602,7 @@ addOpArgs :
602602
| APPLYOP(?) Factor moreOpArgFactors[$item[2]] { Apply($arg[0],$item[3]);}
603603
| { $arg[0]; }
604604

605-
# moreOpArgFactors[$factor1] : Similar to moreFactors,
605+
# moreOpArgFactors[$factor1] : Similar to moreFactors,
606606
# but w/o evalAtOp since that most likely belongs to the operator, not
607607
# the factors.
608608
moreOpArgFactors :
@@ -623,7 +623,7 @@ addIntOpArgs :
623623
| APPLYOP(?) IntFactor moreIntOpArgFactors[$item[2]] { Apply($arg[0],$item[3]);}
624624
| { $arg[0]; }
625625

626-
# moreIntOpArgFactors[$factor1] : Similar to moreOpArgFactors,
626+
# moreIntOpArgFactors[$factor1] : Similar to moreOpArgFactors,
627627
# but recognizing d as diff
628628
moreIntOpArgFactors :
629629
/^\Z/ { $arg[0];} # short circuit!
@@ -659,7 +659,7 @@ nestOperators :
659659
| FUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); }
660660
| OPFUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); }
661661
| TRIGFUNCTION addScripts[$item[1]] { recApply(@arg,$item[2]); }
662-
| OPEN Expression balancedClose[$item[1]]
662+
| OPEN Expression balancedClose[$item[1]]
663663
{ recApply(@arg[0..$#arg-1],
664664
ApplyDelimited($arg[$#arg],$item[1],$item[2],$item[3])); }
665665
| { recApply(@arg); }
@@ -692,20 +692,19 @@ addOpDecoration :
692692
| { $arg[0]; }
693693

694694
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
695-
# Pseudo-Terminals.
695+
# Pseudo-Terminals.
696696
# Useful combinations or subsets of terminals.
697697
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
698698
# A generalized relational operator or arrow
699699
# Note we disallow < or > if we're parsing the contents of a bra or ket!
700-
relop :
700+
relop :
701701
{ ($forbidLRAngle ? 1 : undef); } /RELOP:(less|greater)-than:\d+/ <commit> <reject>
702702
| RELOP addOpDecoration[$item[1]]
703703
| ARROW addOpDecoration[$item[1]]
704704

705705
# Check out whether diffop should be treated as bigop or operator
706-
# It depends on the binding
706+
# It depends on the binding
707707
bigop : BIGOP | SUMOP | INTOP | LIMITOP | DIFFOP
708-
operator: OPERATOR
709708

710709
# SUPOP is really only \prime(s) (?)
711710
supops : SUPOP(s) { New(undef,
@@ -729,7 +728,7 @@ evalAtOp : VERTBAR
729728
# These correspond to the TeX tokens.
730729
# The Lexer strings are of the form TYPE:NAME:NUMBER where
731730
# TYPE is the grammatical role, or part of speech,
732-
# NAME is the specific name (semantic or presentation) of the token
731+
# NAME is the specific name (semantic or presentation) of the token
733732
# NUMBER is the position of the specific token in the current token sequence.
734733
#
735734
# NOTE: RecDescent doesn't clearly distinguish lexing from parsing
@@ -793,7 +792,6 @@ INTOP : /INTOP:\S*:\d+/ { Lookup($item[1]); }
793792
LIMITOP : /LIMITOP:\S*:\d+/ { Lookup($item[1]); }
794793
DIFFOP : /DIFFOP:\S*:\d+/ { Lookup($item[1]); }
795794
OPERATOR : /OPERATOR:\S*:\d+/ { Lookup($item[1]); }
796-
##DIFF : /DIFF:\S*:\d+/ { Lookup($item[1]); }
797795
POSTSUBSCRIPT : /POSTSUBSCRIPT:\S*:\d+/ { Lookup($item[1]); }
798796
POSTSUPERSCRIPT : /POSTSUPERSCRIPT:\S*:\d+/ { Lookup($item[1]); }
799797
FLOATSUPERSCRIPT : /FLOATSUPERSCRIPT:\S*:\d+/ { Lookup($item[1]); }

lib/LaTeXML/MathParser.pm

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ sub printNode {
191191
my ($tag, $attr, @children) = @$node;
192192
my @keys = sort keys %$attr;
193193
return "<$tag"
194-
. (@keys ? ' ' . join(' ', map { "$_='$$attr{$_}'" } @keys) : '')
194+
. (@keys ? ' ' . join(' ', map { "$_='" . ($$attr{$_} || '') . "'" } @keys) : '')
195195
. (@children
196196
? ">\n" . join('', map { printNode($_) } @children) . "</$tag>"
197197
: '/>')
@@ -673,7 +673,6 @@ sub parse_single {
673673
# Now do the actual parse.
674674
($result, $unparsed) = $self->parse_internal($rule, @nodes);
675675
}
676-
677676
# Failure? No result or uparsed lexemes remain.
678677
# NOTE: Should do script hack??
679678
if ((!defined $result) || $unparsed) {
@@ -689,8 +688,6 @@ sub parse_single {
689688
print STDERR "\n=>" . printNode($result) . "\n" . ('=' x 60) . "\n"; }
690689
return $result; } }
691690

692-
use Data::Dumper;
693-
694691
sub node_to_lexeme {
695692
my ($self, $node) = @_;
696693
my $lexeme = getTokenMeaning($node);

0 commit comments

Comments
 (0)