This is my coursework for compilers - here are the links to task 1 and 2 https://users.sussex.ac.uk/~hh435/compilers/task1.html https://users.sussex.ac.uk/~hh435/compilers/task2.html
in case the links break one day -
Task 1: Coursework Task 1: Interpreter Summary In this task you will write an interpreter for a simple programming language. We will use the parser generator ANTLR as our main tool to reduce the amount of hand-written code required.
The main resource is a 'skeleton' project that you can download here.
Please carefully read the details below and the submission guidelines.
Execution model and typing rules A program in our simple programming language starts execution from a function int main(...), which may occur anywhere in the function declarations. All arguments to functions are passed by value, and the only identifiers defined in any function are as follows:
The names of other functions (which may be defined before or after the current function); The parameters taken by the current function; The local variables (initialised at the top of the body) of the current function. In other words, all functions are of 'global scope', while parameters and local variables are of 'function scope'. There is one special case: when a local variable is being initialised, all the other local variables are yet defined and thus not available. We do not use an explicit return keyword — the return value of a function is the value of the last expression in its body. We assume that the input program satisfies the following conditions:
The input program has a main function, which has return type int; No two functions may have the same name; The definition of a function fnA may involve invocations of other functions that are defined before or after the function fnA; No two parameters or local variables of the same function fnA may have the same name; No parameter or local variable may have the same name as a function; Parameters and local variables of fnA are only in-scope in the body of fnA; All parameters or local variables must have either int or bool types; In addition, we assume it that satisfies the following typing rules:
For each function, the return type matches the type of the return value; For each function invocation, the numbers and types of the arguments passed match that of the parameters of the function being called; The following expressions have unit type: print x, space, newline, skip, x := y, while x do {...}, repeat {...} until x; The type of a function invocation f(e1, ..., en) is the return type of f; The type of a block { e1; ...; en } is the type of the last expression en; The type of an if expression is the type of its then block (and its else block); In any comparison such as x < y, both x and y must have int type; In any arithmetic operation such as x + y, both x and y must have int type; In any Boolean operation such as x & y, both x and y must have bool type; In any if expression, its then block and its else block have the same type; In any if, while, or repeat expression, the condition must have bool type; In any while or repeat expression, the loop body must have unit type; In any assignment x := y, both x and y must have the same type; In any print expression print x, either x is space or newline, or x has int type. In short, the input program is assumed to be 'well-formed' and 'well-typed'. The semantics of the language constructs should be self-explanatory; in particular, your interpreter will print to System.out as requested by print statements in the input program.
Input / output specification and some technical details Your grammar file is SimpleLang.g4 with a start rule called prog, and thus the lexer and parser classes generated by ANTLR are SimpleLangLexer and SimpleLangParser, respectively. Your interpreter class SimpleLangInterpreter implements SimpleLangVisitor, and it has an extra method visitProgram which also returns an Integer (return value of main) and takes two arguments: the first is the parse tree's root node (a SimpleLangParser.ProgContext), the second is an array of strings (arguments) where each of them must either be an INTLIT or a BOOLLIT; we assume that they match the number and types of the parameters of main. After simulating (interpreting) the input program (with the arguments), the interpreter prints a line separator and then two separate lines, each with a line separator: the first contains the string NORMAL_TERMINATION, and the second contains the return value of main.
Task 2: Coursework Task 2: RISC-V code generator Summary In this task, you are going to implement a code generator targeting RISC-V assembly for the same simple programming language (i.e. you can reuse the lexer and parser developed in Task 1). The principal idea here is to use stack-machine assembly, which can easily be embedded into RISC-V assembly, as an intermediate language to simplify this task.
The main resource is a 'skeleton' project that you can download here.
Please carefully read the details below and the submission guidelines.
Input / output specification and some technical details As before, we assume that the input program is 'well-formed' and 'well-typed'. Your code generator will generate RISC-V assembly code (or in our case, stack-machine assembly code with RISC-V 'polyfills') that implements the input program and simulate the generated code with RARS. We suggest the following calling convention when calling a function:
Pushes a return value onto the stack (meant to be modified by the callee). Pushes the arguments (in reverse order) onto the stack. Pushes the return address — the address right after the jump — onto the stack. The organisation of the source files for Task 2 is very similar to Task 1; specifically,
Your code generator class SimpleLangCodeGenerator implements SimpleLangVisitor, and it has an extra method visitProgram which also returns a String (generated code) and takes the same two arguments as in Task 1. Your code generator simulates the generated code and prints out the results in exactly the same way as in Task 1 using RARS's API (see the skeleton project for details).
Some suggestions and hints Each new assembly instruction should be on a new line — don't put more than one instruction on a single line. You'll need a function that generates fresh labels. Don't hardcode labels, ensure that the function generates a fresh label every time it is called. Don't use a random generator for this, use a global / static variable for this purpose. This is one of the few legitimate uses of global / static variables. Check that your submission is not miscompiling conditional and loop constructs like if / then / else or repeat / until. Consider giving a 'dummy' value to the unit expressions like skip (why?) We strongly recommend testing your code before submission. Note that pasting code into RARS and simply press 'Assemble' is insufficient as test, because that doesn't test the termination of the code generated by your code generator. [Run] it!