Skip to content

Commit 27dee06

Browse files
committed
Allow angle brackets around EBNF symbols.
Fixes #15
1 parent c8f4095 commit 27dee06

17 files changed

+102
-47
lines changed

Gemfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ group :development do
1313
gem "redcarpet", platforms: :mri
1414
gem "rocco", platforms: :mri
1515
gem "pygmentize", platforms: :mri
16+
gem 'getoptlong'
1617
end
1718

1819
group :development, :test do

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Generate formatted grammar using HTML (requires [Haml][Haml] gem):
7575

7676
### Parsing an ISO/IEC 14977 Grammar
7777

78-
The EBNF gem can also parse [ISO/EIC 14977] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
78+
The EBNF gem can also parse [ISO/IEC 14977][] Grammars (ISOEBNF) to [S-Expressions][S-Expression].
7979

8080
grammar = EBNF.parse(File.open('./etc/iso-ebnf.isoebnf'), format: :isoebnf)
8181

@@ -104,7 +104,7 @@ The [EBNF][] variant used here is based on [W3C](https://w3.org/) [EBNF][]
104104
as defined in the
105105
[XML 1.0 recommendation](https://www.w3.org/TR/REC-xml/), with minor extensions:
106106

107-
Note that the grammar includes an optional `[identifer]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
107+
Note that the grammar includes an optional `[number]` in front of rule names, which can be in conflict with the `RANGE` terminal. It is typically not a problem, but if it comes up, try parsing with the `native` parser, add comments or sequences to disambiguate. EBNF does not have beginning of line checks as all whitespace is treated the same, so the common practice of identifying each rule inherently leads to such ambiguity.
108108

109109
The character set for EBNF is UTF-8.
110110

@@ -116,7 +116,7 @@ which can also be proceeded by an optional number enclosed in square brackets to
116116

117117
[1] symbol ::= expression
118118

119-
(Note, this can introduce an ambiguity if the previous rule ends in a range or enum and the current rule has no identifier. In this case, enclosing `expression` within parentheses, or adding intervening comments can resolve the ambiguity.)
119+
(Note, this can introduce an ambiguity if the previous rule ends in a range or enum and the current rule has no number. In this case, enclosing `expression` within parentheses, or adding intervening comments can resolve the ambiguity.)
120120

121121
Symbols are written in CAPITAL CASE if they are the start symbol of a regular language (terminals), otherwise with they are treated as non-terminal rules. Literal strings are quoted.
122122

@@ -134,7 +134,7 @@ Within the expression on the right-hand side of a rule, the following expression
134134
<tr><td><code>[^abc], [^#xN#xN#xN]</code></td>
135135
<td>matches any UTF-8 R\_CHAR or HEX with a value not among the characters given. The last component may be '-'. Enumerations and ranges of excluded values may be mixed in one set of brackets.</td></tr>
136136
<tr><td><code>"string"</code></td>
137-
<td>matches a literal string matching that given inside the double quotes.</td></tr>
137+
<td>matches a literal string matching that given inside the double quotes case insensitively.</td></tr>
138138
<tr><td><code>'string'</code></td>
139139
<td>matches a literal string matching that given inside the single quotes.</td></tr>
140140
<tr><td><code>A (B | C)</code></td>
@@ -158,7 +158,8 @@ Within the expression on the right-hand side of a rule, the following expression
158158
</table>
159159

160160
* Comments include `//` and `#` through end of line (other than hex character) and `/* ... */ (* ... *) which may cross lines`
161-
* All rules **MAY** start with an identifier, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`
161+
* All rules **MAY** start with an number, contained within square brackets. For example `[1] rule`, where the value within the brackets is a symbol `([a-z] | [A-Z] | [0-9] | "_" | ".")+`, which is not retained after parsing
162+
* Symbols **MAY** be enclosed in angle brackets `'<'` and `>`, which are dropped when parsing.
162163
* `@terminals` causes following rules to be treated as terminals. Any terminal which is all upper-case (eg`TERMINAL`), or any rules with expressions that match characters (`#xN`, `[a-z]`, `[^a-z]`, `[abc]`, `[^abc]`, `"string"`, `'string'`, or `A - B`), are also treated as terminals.
163164
* `@pass` defines the expression used to detect whitespace, which is removed in processing.
164165
* No support for `wfc` (well-formedness constraint) or `vc` (validity constraint).
@@ -177,7 +178,7 @@ Intermediate representations of the grammar may be serialized to Lisp-like [S-Ex
177178

178179
is serialized as
179180

180-
(rule ebnf "1" (star (alt declaration rule)))
181+
(rule ebnf (star (alt declaration rule)))
181182

182183
Different components of an EBNF rule expression are transformed into their own operator:
183184

etc/ebnf.ebnf

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@
3434

3535
[11] LHS ::= ('[' SYMBOL ']' ' '+)? SYMBOL ' '* '::='
3636

37-
[12] SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
37+
[12] SYMBOL ::= '<' O_SYMBOL '>' | O_SYMBOL
38+
39+
[12a] O_SYMBOL ::= ([a-z] | [A-Z] | [0-9] | '_' | '.')+
3840

3941
[13] HEX ::= '#x' ([a-f] | [A-F] | [0-9])+
4042

etc/ebnf.html

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- Generated with ebnf version 2.4.0. See https://github.com/dryruby/ebnf. -->
1+
<!-- Generated with ebnf version 2.5.0. See https://github.com/dryruby/ebnf. -->
22
<table class="grammar">
33
<tbody id="grammar-productions" class="ebnf">
44
<tr id="grammar-production-ebnf">
@@ -77,6 +77,12 @@
7777
<td>[12]</td>
7878
<td><code>SYMBOL</code></td>
7979
<td>::=</td>
80+
<td><code class="grammar-paren">(</code>'<code class="grammar-literal">&lt;</code>' <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a> '<code class="grammar-literal">&gt;</code>'<code class="grammar-paren">)</code> <code class="grammar-alt">|</code> <a href="#grammar-production-O_SYMBOL">O_SYMBOL</a></td>
81+
</tr>
82+
<tr id="grammar-production-O_SYMBOL">
83+
<td>[12a]</td>
84+
<td><code>O_SYMBOL</code></td>
85+
<td>::=</td>
8086
<td><code class="grammar-paren">(</code><code class="grammar-brac">[</code><code class="grammar-literal">a-z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">A-Z</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> <code class="grammar-brac">[</code><code class="grammar-literal">0-9</code><code class="grammar-brac">]</code> <code class="grammar-alt">|</code> '<code class="grammar-literal">_</code>' <code class="grammar-alt">|</code> '<code class="grammar-literal">.</code>'<code class="grammar-paren">)</code><code class="grammar-plus">+</code></td>
8187
</tr>
8288
<tr id="grammar-production-HEX">

etc/ebnf.ll1.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This file is automatically generated by ebnf version 2.4.0
1+
# This file is automatically generated by ebnf version 2.5.0
22
# Derived from etc/ebnf.ebnf
33
module Meta
44
START = :ebnf

etc/ebnf.ll1.sxp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,8 @@
100100
(seq '@pass' expression))
101101
(terminals _terminals (seq))
102102
(terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
103-
(terminal SYMBOL "12" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
103+
(terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
104+
(terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
104105
(terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
105106
(terminal RANGE "14"
106107
(seq '['

etc/ebnf.peg.rb

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This file is automatically generated by ebnf version 2.4.0
1+
# This file is automatically generated by ebnf version 2.5.0
22
# Derived from etc/ebnf.ebnf
33
module EBNFMeta
44
RULES = [
@@ -25,11 +25,13 @@ module EBNFMeta
2525
EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
2626
EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
2727
EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
28-
EBNF::Rule.new(:SYMBOL, "12", [:plus, :_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
29-
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:alt, :_SYMBOL_2, :_SYMBOL_3, :_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
30-
EBNF::Rule.new(:_SYMBOL_2, "12.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
31-
EBNF::Rule.new(:_SYMBOL_3, "12.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
32-
EBNF::Rule.new(:_SYMBOL_4, "12.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
28+
EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
29+
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
30+
EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
31+
EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
32+
EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
33+
EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
34+
EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
3335
EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
3436
EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
3537
EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),

etc/ebnf.peg.sxp

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,13 @@
2222
(terminal _LHS_3 "11.3" (seq '[' SYMBOL ']' _LHS_4))
2323
(terminal _LHS_4 "11.4" (plus ' '))
2424
(terminal _LHS_2 "11.2" (star ' '))
25-
(terminal SYMBOL "12" (plus _SYMBOL_1))
26-
(terminal _SYMBOL_1 "12.1" (alt _SYMBOL_2 _SYMBOL_3 _SYMBOL_4 '_' '.'))
27-
(terminal _SYMBOL_2 "12.2" (range "a-z"))
28-
(terminal _SYMBOL_3 "12.3" (range "A-Z"))
29-
(terminal _SYMBOL_4 "12.4" (range "0-9"))
25+
(terminal SYMBOL "12" (alt _SYMBOL_1 O_SYMBOL))
26+
(terminal _SYMBOL_1 "12.1" (seq '<' O_SYMBOL '>'))
27+
(terminal O_SYMBOL "12a" (plus _O_SYMBOL_1))
28+
(terminal _O_SYMBOL_1 "12a.1" (alt _O_SYMBOL_2 _O_SYMBOL_3 _O_SYMBOL_4 '_' '.'))
29+
(terminal _O_SYMBOL_2 "12a.2" (range "a-z"))
30+
(terminal _O_SYMBOL_3 "12a.3" (range "A-Z"))
31+
(terminal _O_SYMBOL_4 "12a.4" (range "0-9"))
3032
(terminal HEX "13" (seq '#x' _HEX_1))
3133
(terminal _HEX_1 "13.1" (plus _HEX_2))
3234
(terminal _HEX_2 "13.2" (alt _HEX_3 _HEX_4 _HEX_5))

etc/ebnf.sxp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
(rule pass "10" (seq '@pass' expression))
1313
(terminals _terminals (seq))
1414
(terminal LHS "11" (seq (opt (seq '[' SYMBOL ']' (plus ' '))) SYMBOL (star ' ') '::='))
15-
(terminal SYMBOL "12" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
15+
(terminal SYMBOL "12" (alt (seq '<' O_SYMBOL '>') O_SYMBOL))
16+
(terminal O_SYMBOL "12a" (plus (alt (range "a-z") (range "A-Z") (range "0-9") '_' '.')))
1617
(terminal HEX "13" (seq '#x' (plus (alt (range "a-f") (range "A-F") (range "0-9")))))
1718
(terminal RANGE "14"
1819
(seq '['

lib/ebnf/ebnf/meta.rb

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# This file is automatically generated by ebnf version 2.0.0
1+
# This file is automatically generated by ebnf version 2.5.0
22
# Derived from etc/ebnf.ebnf
33
module EBNFMeta
44
RULES = [
@@ -25,11 +25,13 @@ module EBNFMeta
2525
EBNF::Rule.new(:_LHS_3, "11.3", [:seq, "[", :SYMBOL, "]", :_LHS_4], kind: :terminal).extend(EBNF::PEG::Rule),
2626
EBNF::Rule.new(:_LHS_4, "11.4", [:plus, " "], kind: :terminal).extend(EBNF::PEG::Rule),
2727
EBNF::Rule.new(:_LHS_2, "11.2", [:star, " "], kind: :terminal).extend(EBNF::PEG::Rule),
28-
EBNF::Rule.new(:SYMBOL, "12", [:plus, :_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
29-
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:alt, :_SYMBOL_2, :_SYMBOL_3, :_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
30-
EBNF::Rule.new(:_SYMBOL_2, "12.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
31-
EBNF::Rule.new(:_SYMBOL_3, "12.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
32-
EBNF::Rule.new(:_SYMBOL_4, "12.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
28+
EBNF::Rule.new(:SYMBOL, "12", [:alt, :_SYMBOL_1, :O_SYMBOL], kind: :terminal).extend(EBNF::PEG::Rule),
29+
EBNF::Rule.new(:_SYMBOL_1, "12.1", [:seq, "<", :O_SYMBOL, ">"], kind: :terminal).extend(EBNF::PEG::Rule),
30+
EBNF::Rule.new(:O_SYMBOL, "12a", [:plus, :_O_SYMBOL_1], kind: :terminal).extend(EBNF::PEG::Rule),
31+
EBNF::Rule.new(:_O_SYMBOL_1, "12a.1", [:alt, :_O_SYMBOL_2, :_O_SYMBOL_3, :_O_SYMBOL_4, "_", "."], kind: :terminal).extend(EBNF::PEG::Rule),
32+
EBNF::Rule.new(:_O_SYMBOL_2, "12a.2", [:range, "a-z"], kind: :terminal).extend(EBNF::PEG::Rule),
33+
EBNF::Rule.new(:_O_SYMBOL_3, "12a.3", [:range, "A-Z"], kind: :terminal).extend(EBNF::PEG::Rule),
34+
EBNF::Rule.new(:_O_SYMBOL_4, "12a.4", [:range, "0-9"], kind: :terminal).extend(EBNF::PEG::Rule),
3335
EBNF::Rule.new(:HEX, "13", [:seq, "#x", :_HEX_1], kind: :terminal).extend(EBNF::PEG::Rule),
3436
EBNF::Rule.new(:_HEX_1, "13.1", [:plus, :_HEX_2], kind: :terminal).extend(EBNF::PEG::Rule),
3537
EBNF::Rule.new(:_HEX_2, "13.2", [:alt, :_HEX_3, :_HEX_4, :_HEX_5], kind: :terminal).extend(EBNF::PEG::Rule),

0 commit comments

Comments
 (0)