This parser can only process:
- Main declaration
- Scopes within scopes
- Arithmetic expressions
- Parenthesized expressions
- Expression comparisons
- Numeric and String-like variable assignments
- String assignments for any character that can be a variable name 'ID'
This parser will not detect semantic errors such as:
- Wrong Type comparisons
- Expressions, comparative expressions
Just like in Java, any of these things NEED to be inside a main scope, this is a valid minimum statement for java to process:
public static void main () { Scope }
The Scope can be as small as a single empty statement as ;
or several statements with their own scope:
public static void main ( ) { ; { ; { ; } } }
We can also have expressions as the one below, regardless of the type that 'id' might be. We will ignore this because this is a grammar parser, not a semantic parser.
public static void main ( ) { 28 + 28 * ( 28 - 28 / 28 ) + 28 ^ id - ( id + 28 ) ; }
We can also have expressions where the assignment is an expression or an expression surrounded by parentheses. The parser will check for numeric assignments to make sense (numeric type) and for string types, it will look for double quotes to delimit the value to assign to the string. In this case, anything that the tokenizer accepts as an identifier ID
will be valid as an input for these.
public static void main ( ) {
float id = 2.8 ;
{
String id = " character adn others " ;
int id = 2 + id ;
8 != id ;
( id >= id ) ;
}
}
The complexity in time is
Now, the parser is another story. Assuming it uses a PDA implementation, it should be
Expression ::= Term
Expression ::= Term Operator Term
Term ::= ( Expression )
Term ::= number
Term ::= id
Operator ::= ^ | + | - | / | *
The problem is the Expression:: Term Operator Term
production. Expression and Term productions do not have a common prefix factored out.
We proceed to factor it out as Expression_prime and make it nullable.
With these changes we now have an LL1 grammar.
Expression ::= Term Expression_prime
Expression_prime ::= Operator Term Expression_prime
Expression_prime ::= ''
Term ::= ( Expression )
Term ::= number
Term ::= id
Operator ::= ^
Operator ::= +
Operator ::= -
Operator ::= /
Operator ::= *
number + number * ( number - number / number ) + number ^ id - ( id + number )
number + number + ( id * ( id - id ) )
All we need for a comparison grammar is to add the comparison operators, because they follow the same structure as any binary operator!
The Expressions now include comparisons with any term, including expressions! However there might be semantic errors due to different type comparisons.
This would require semantic parsing, so we just skip it.
Expression ::= Term Expression_prime
Expression_prime ::= Operator Term Expression_prime
Expression_prime ::= Comp_Operator Term Expression_prime
Expression_prime ::= ''
Term ::= ( Expression )
Term ::= number
Term ::= id
Term ::= false
Term ::= true
Comp_Operator ::= &&
Comp_Operator ::= ||
Comp_Operator ::= ==
Comp_Operator ::= !=
Comp_Operator ::= >
Comp_Operator ::= >=
Comp_Operator ::= <
Comp_Operator ::= <=
Operator ::= ^
Operator ::= +
Operator ::= -
Operator ::= /
Operator ::= *
( true ) != ( false )
( true ) == ( id )
( number ) >= ( id )
( number ) <= ( id )
( number ) < ( id )
( number ) > ( number )
( true && true ) == ( true )
true && true
number + number * ( number - number / number ) + number ^ id - ( id + number )
Main ::= public static void main ( ) Scope
Scope ::= { Statement }
Statement ::= Statement || ;
Basically every scope is whatever statements are between our scope delimiters "{ }" and a statement can be as simple as just ";".
We develop Statement
to Statements
so we can eliminate left recursion but now we have a non terminal that can end in two different non terminals!
Statements ::= Statement Statements
Statements ::= Statement
Statement ::= ;
Our grammar is ambiguous because of Statements
which can produce multiple trees.
We simply add an intermediate state Statements_prime
which can be nulled. This removes ambiguity completely.
However, we still need to come back from other scopes, so Statements_prime
can also call the Scope_prime
, which allows us to have several statements before another scope.
Main ::= public static void main ( ) Main_scope
Main_scope ::= { Scope }
Scope ::= Scope_prime
Scope ::= Statements
Scope_prime ::= { Statements }
Statements ::= Statement Statements_prime
Statements_prime ::= Statements
Statements_prime ::= Scope_prime
Statements_prime ::= ''
Statement ::= ;
public static void main ( ) { ; } public static void main ( ) { { ; { ; } } } public static void main ( ) { ; { ; { ; } } }
Statement ::= ;
Statement ::= Expression ;
Statement ::= Num_Assignment ;
Statement ::= String_Assignment ;
Num_Type ::= int
Num_Type ::= double
Num_Type ::= float
Num_Assignment ::= Num_Type id = Expression
String_Assignment ::= string id = String_like
String_like ::= quot Valid_Chars quot
Valid_Chars ::= Char Valid_Chars_prime
Valid_Chars_prime ::= Valid_Chars
Valid_Chars_prime ::= ''
Char ::= char
I subdivide assignments into num like and string like assignments to avoid any ambiguity and recursion directly.
public static void main ( ) { ; { ; { string id = quot char quot ; } } } public static void main ( ) { string id = quot char quot ; } public static void main ( ) { int id = number * number ; }
Now that we have gruesomely defined every gramamr and made sure they are all LL(1) we can join all of them together. Yes, all of them.
Main ::= public static void main ( ) Main_scope
Main_scope ::= { Scope }
Scope ::= Scope_prime
Scope ::= Statements
Scope_prime ::= { Statements }
Statements ::= Statement Statements_prime
Statements_prime ::= Statements
Statements_prime ::= Scope_prime
Statements_prime ::= ''
Statement ::= ;
Statement ::= Expression ;
Statement ::= Num_Assignment ;
Statement ::= String_Assignment ;
Expression ::= Term Expression_prime
Expression_prime ::= Operator Term Expression_prime
Expression_prime ::= Comp_Operator Term Expression_prime
Expression_prime ::= ''
Term ::= ( Expression )
Term ::= number
Term ::= id
Term ::= false
Term ::= true
Comp_Operator ::= &&
Comp_Operator ::= ||
Comp_Operator ::= ==
Comp_Operator ::= !=
Comp_Operator ::= >
Comp_Operator ::= >=
Comp_Operator ::= <
Comp_Operator ::= <=
Operator ::= ^
Operator ::= +
Operator ::= -
Operator ::= /
Operator ::= *
Num_Type ::= int
Num_Type ::= double
Num_Type ::= float
Num_Assignment ::= Num_Type id = Expression
String_Assignment ::= string id = String_like
String_like ::= quot Valid_Chars quot
Valid_Chars ::= Char Valid_Chars_prime
Valid_Chars_prime ::= Valid_Chars
Valid_Chars_prime ::= ''
Char ::= char
public static void main ( ) { float id = number ; { string id = quot char quot ; int id = number + id ; number != id ; ( id && id ) ; } }
We can now transform this grammar for our tokenizer and parser as follows:
Main -> public static void main 'L_PAR' 'R_PAR' Main_scope
Main_scope -> 'L_CUR' Scope 'R_CUR'
Scope -> Scope_prime
Scope -> Statements
Scope_prime -> 'L_CUR' Statements 'R_CUR'
Statements -> Statement Statements_prime
Statements_prime -> Statements
Statements_prime -> Scope_prime
Statements_prime ->
Statement -> 'END'
Statement -> Expression 'END'
Statement -> Num_Assignment 'END'
Statement -> String_Assignment 'END'
Expression -> Term Expression_prime
Expression_prime -> Operator Term Expression_prime
Expression_prime -> Comp_Operator Term Expression_prime
Expression_prime ->
Term -> 'L_PAR' Expression 'R_PAR'
Term -> 'NUMBER'
Term -> 'ID'
Term -> 'TRUE'
Term -> 'FALSE'
Comp_Operator -> 'AMPERSAND' 'AMPERSAND'
Comp_Operator -> 'PIPE' 'PIPE'
Comp_Operator -> 'ASSIGN' 'ASSIGN'
Comp_Operator -> 'NOT' 'ASSIGN'
Comp_Operator -> 'MORE'
Comp_Operator -> 'MORE' 'ASSIGN'
Comp_Operator -> 'LESS'
Comp_Operator -> 'LESS' 'ASSIGN'
Operator -> 'POW'
Operator -> 'PLUS'
Operator -> 'MINUS'
Operator -> 'DIV'
Operator -> 'MULT'
Num_Assignment -> 'NUMTYPE' 'ID' 'ASSIGN' Expression
String_Assignment -> string 'ID' 'ASSIGN' String_like
String_like -> 'QUOT' Valid_Chars 'QUOT'
Valid_Chars -> Char Valid_Chars_prime
Valid_Chars_prime -> Valid_Chars
Valid_Chars_prime ->
Char -> Statements