Java translator for code written in an arbitrary 'P' language following a LL(1) defined grammar.
The aim of the project is to develop a Java translator for code written in an arbitrary 'P' language.
Components needed to build a translator:
The job of this component is to read a text sequence and produce the associated list of token; a token correspond to a lexical unit like a number, an ID, a relational operator, keyword etc. Token of the language are shown in the following table:
Token | Pattern | Name |
---|---|---|
Numbers | Numeric constant | 256 |
Identifier | Letter followed by letters and numbers | 257 |
Relop | Relational operator (<,>,<=,>=,==,<>) | 258 |
Assignment | assign | 259 |
To | to | 260 |
If | if | 261 |
Else | else | 262 |
While | while | 263 |
Begin | begin | 264 |
End | end | 265 |
266 | ||
Read | read | 267 |
Disjunction | // | 268 |
Conjunction | && | 269 |
Negation | ! | 33 |
Parentheses left | ( | 40 |
Parentheses right | ) | 41 |
Brace left | { | 123 |
Brace right | } | 125 |
Addition | + | 43 |
Subtraction | - | 45 |
Multiplication | * | 42 |
Division | / | 47 |
Semicolon | ; | 59 |
Colon | , | 44 |
EOF | end of input | -1 |
Identifiers correspond to the following regular expression:
(a + ... + z + A + ... + Z)(a + ... + z + A + ... + Z + 0 + ... + 9)*
while numbers correspond to the following regular expression:
0 + (1 + ... + 9)(0 + ... + 9)*
The lexical scanner must ignore all white space characters, single and multiple line Java comments, but should report illegal characters like '#' or '@'. The output of the lexical scanner must be in the form of ... .
For example, for input "assign 300 to d" the lexical scanner output is:
<259,assign> <256,300> <260,to> <257,d> <59> <-1>
The lexical scanner is not able to recognize the structure of commands, like 5+;) or (34+26( - (2+15-( 27. These once will be accepted by the lexical scanner.
This component is the deterministic "recognizer" of strings generated by the given grammar. It is based on the following productions set:
Legend
- P stands for < prog >
- S stands for < statlist >
- S' stands for < statlistp >
- S" stands for < stat >
- S"' stands for < statp >
- I stands for < idlist >
- I' stands for < idlistp >
- B stands for < bexpr >
- E stands for < expr >
- E' stands for < exprlist >
- E" stands for < exprlistp >
Productions
- P --> S EOF
- S --> S" S'
- S' --> ; S" S' | ε
- S" --> assign E to I
- S" --> print ( E' )
- S" --> read ( I )
- S" --> while ( B ) S"
- S" --> if ( B ) S" S"'
- S" --> { S }
- S"' --> end | else S" end
- I --> ID I'
- I' --> , ID I' | ε
- B --> RELOP E E
- E --> + ( E' ) | - E E
- E --> * ( E' ) | / E E
- E --> NUM | ID
- E' --> E E"
- E" --> , E E" | ε
This component is based on the SDT built following the LL(1) grammar.
This component translate programs written in a simple programming language (called P language). Programs written in the P language have the .lft extension.
The translator must generate bytecode that can be executed directly by the JVM. Generating bytecode that can be directly executed by the JVM is not a simple operation, due to the complexity of the .class file (binary format). For this reason the bytecode will be generated through the use of a mnemonic language (https://en.wikipedia.org/wiki/List_of_Java_bytecode_instructions) that refers to the JVM assembly instructions and, subsequently, it can be translated into the .class format by the assembler program.
The assembler program does a 1:1 translation of the mnemonic instructions into the corresponding JVM instructions (opcode). The assembler program used in this project is called Jasmin (http://jasmin.sourceforge.net/). The translation scheme is the following:
The source code is translated by the compiler into the JVM assembler language (.j extension); after that it is translated into the .class file by Jasmin assembler program. The compiling process generates an Output.j file and the following command is used to convert it in the Output.class file:
java -jar jasmin.jar Output.j
that can be executed with the following command:
java Output
The final output should be an Output.j file, containing header, mnemonic instruction list and footer. The output is generated according to the test file given read as input in the main of the Traslator class. As an example, if the P language code to translate is the following:
read (a);
print(+(a,1))
the correspondent Output.j file will contains the following:
invokestatic Output/read()I
istore 0
goto L1
L1:
iload 0
ldc 1
iadd
invokestatic Output/print(I)V
goto L2
L2:
goto L0
L0:
- Java 17
- IntelliJ Idea Ultimate
- Jasmin
- Gradle
- GitHub Actions