A C compiler built for educational purposes, targeting the VeSPA architecture.
- Overview
- Introduction
- Developer Team
- Repository Components
- Tools & Programming Language Used
- Compiler Overview
- Lexical Analysis
- Syntactic Analysis
- Semantic Analysis
- Constant Folding Optimization
- VeSPA ABI – Application Binary Interface
- Code Generation
- Code Generation Templates
- How to Run the Compiler
- VeSPA Simulator
- VeSPA Assembler
- VeSPA Instruction Scheduler
- My Contributions to the RocketC Compiler
RocketC is a custom C compiler developed specifically for the VeSPA processor, a RISC-based System-on-Chip implemented on FPGA.
It compiles C programs into VeSPA Assembly and machine code, which can then be executed on the hardware.
The goal of this project is to design and implement a full compilation flow capable of generating executable code for the VeSPA architecture, enabling high-level programming on a custom-built CPU.
➡️ This project was developed by a team of 20 students as part of the Embedded Systems course.
➡️ The class was organized into three groups, which rotated roles throughout the semester to cover different stages of the development process
➡️ The roles were:
- Rocket C Compiler Frontend - Current Repository
- Rocket C Compiler Backend - Current Repository
- CPU Pipeline (SoC Team)
| Folder | Description |
|---|---|
Assembler/ |
Contains Assembler code, which turns VeSPA Assembly into machine code (.coe / binary format). |
Compiler/ |
Contains the RocketC VeSPA Compiler code |
Instruction Scheduler/ |
Includes the Instruction Scheduler, used to reduce hazards generated by the compiler in VeSPA code. |
Simulator/ |
RocketSim - Software simulator for the VeSPA CPU, capable of executing assembly/binary programs step-by-step. |
Reports and Presentation/ |
Includes the project report and presentation slides. |
Programming Language - C
| Tool | Purpose |
|---|---|
| VS Code | Main development environment |
| GCC | C compiler used to develop the VESPA compiler |
| GNU Flex | Lexical analyzer generator (tokenizer) |
| GNU Bison | Parser generator for syntax, grammar definition and AST build |
| Makefile | Build automation and project compilation |
Lexical analysis is the first stage of the compiler. Its role is to read the input C source code and convert it into a sequence of tokens. Anything that doesn't match the expected patterns of the language is reported as a lexical error. As stated before, this was implemented using the open source GNU Flex tool.
GNU Flex
- Flex reads a specification file (
.l) containing regular expressions for tokens. - It generates a C source file that scans the input program and produces tokens.
- These tokens are then passed to the syntax parser (Bison).
Why Flex is used:
- Fast and efficient token scanning
- Easy integration with C and Bison/Yacc
- Supports regular expressions for easy token definition
- Commonly used in real compilers (GCC, Clang, etc.)
The following elements were defined in the lexer:
reserved words, special symbols, and their corresponding regular expressions.
Limitations:
The following C features are not supported in this version of the compiler:
struct,typedef,enum,union,sizeof,->,.
The full lexer implementation for the VeSPA C Compiler is available here:
Compiler/Lexer/LexScanner.l
Syntax analysis (or parsing) is the second stage of the compiler. Its purpose is to take the tokens generated by the lexer and verify whether they follow the grammar rules of the C language.
If the code is syntactically correct, the parser constructs a Parse Tree or Abstract Syntax Tree (AST).
If the input does not follow the grammar, a syntax error is reported.
This stage is implemented using GNU Bison, a parser generator that works alongside Flex.
GNU Bison is a bottom-up parser generator. It reads a grammar specification file (.y) written in BNF (Backus–Naur Form) syntax.
It generates a bottom-up LR parser, which works as follows:
- Reads tokens from the lexer from left to right
- Shifts tokens onto a parsing stack
- Reduces groups of tokens according to grammar rules
- Repeats this process until everything is reduced into the start symbol of the grammar
- Integrates with Flex
- Supports complex grammar definitions
- Used in real compilers like GCC
The complete grammar specifications are implemented in:
Compiler/Parser/SyntaxParser.y
This table details different types of nodes commonly found in an AST, along with their specific child nodes. Each row represents a type of node, and the columns indicate the different types of child nodes.
Semantic analysis is the third stage of the compiler.
While syntax analysis checks if the structure of the program is valid, semantic analysis ensures it also makes logical sense according to C language rules.
It verifies:
- Variables are declared before use
- Type compatibility in expressions and assignments
- Scope rules are followed
- Functions are called with the correct number and types of arguments
The compiler uses a scoped symbol table system, implemented with hash tables for fast insertion and lookup of identifiers.
Each block of code (global, function, or { } block) has its own symbol table, and these tables are organized in a hierarchical structure.
- Each symbol is associated with the scope in which it was declared
- Uses nearest scope resolution – the compiler searches the current scope first, then moves outward to parent scopes
- Avoids naming conflicts across different parts of the program
- Hash tables store symbols efficiently, and linked lists are used to handle hash collisions.
- A variable declared in an outer scope is visible to all inner scopes
- A variable declared in an inner scope is NOT visible outside that scope
The VeSPA compiler implements a strongly-typed subset of the C language, which is more restrictive than standard C.
In standard C, the language is considered weakly typed, meaning many implicit or unsafe type conversions are allowed (sometimes with just a warning). This flexibility is powerful but can result in undefined behavior.
In VeSPA C, the compiler enforces stricter rules, so if a type is incorrect or incompatible, the program will not compile.
- Array indices must be of type
INT,SHORT, orLONG - Increment (
++) and decrement (--) operations are only valid on variables and pointers switchandcaseconditions must evaluate toINTorCHAR- Conditions in
if,while,do-while,switch, and ternary (?:) cannot be of typeSTRING - Bitwise operators (
&,|,^,<<,>>) only accept operands of typeINT,CHAR,SHORT, orLONG - Division by zero is not allowed
- Assigning a pointer to another pointer is allowed
- Assigning the address of a variable to a pointer is allowed
- Assigning a
STRINGto aCHAR*pointer is allowed - Assigning an
INTvalue to aCHARvariable is allowed - Pointer typecasting is allowed
Constant folding is an optimization that checks the AST for operations where all operands are constants.
If possible, it evaluates the result at compile time so the code generation phase has fewer nodes to process.
Before:
(-)
/ \
(+) 10
/ \
x (*)
/ \
3 7
After
(-)
/ \
(+) 10
/ \
x 21
The Application Binary Interface (ABI) defines how the code generated by the compiler interacts with the VeSPA hardware.
It acts as an agreement between the CPU architecture and the compiler, ensuring that both interpret function calls, register usage, data formats and memory structures in a consistent manner.
This ABI specifies:
- Memory organization
- Register usage conventions
- Function calling conventions and stack frame layout
- Data representation (endianness, data sizes and alignment)
- Interrupt handling rules
Full ABI specification here.
| Register | Purpose | Caller-Saved |
|---|---|---|
| R0 | Constant zero (hardwired) | – |
| R1 | Return address for function calls | Yes |
| R2 | Frame Pointer (FP) | Yes |
| R3 | Stack Pointer (SP) | Yes |
| R4–R5 | Return registers and special registers used for Mul/Div operations | No |
| R6–R31 | General-purpose temporary registers | No |
The Code Generation phase is the final stage of the compiler.
Its goal is to translate the validated Abstract Syntax Tree (AST) into target assembly code or machine code for execution.
Code Generation main flowchart is shown below
Before generating instructions for user-defined functions or statements, the compiler emits a small block of initialization code to properly set up the runtime environment.
- Place a
JMPinstruction at address0x00to jump to the initialization block located at0x40. - Initialize the Stack Pointer (R2) and Frame Pointer (R3) to the top of the stack (address
1023). - Preload the Return Address Register (R1) with the address of a
HALTinstruction and push it onto the stack.
This ensures that whenmain()finishes execution, the program halts. - Load a register with the address/label of the
mainfunction and jump to it.
.org 0x00
JMP R0, #64 ; Jump to initialization code at 0x40
.org 0x4000
HALT ; Stop execution
.org 0x40
LDI R2, #1023 ; Stack Pointer = top of stack
LDI R3, #1023 ; Frame Pointer = top of stack
LDI R1, #16384 ; Return address pointing to HALT
SUB R3, R3, #1 ; Adjust stack for storing return address
STX R1, R3, #0 ; Push return address onto stack
LDI R31, :FUNCTION_main
JMP R31, #0 ; Jump to main function
The baseline VeSPA ISA does not include native instructions for multiplication, division, or modulus.
To support these operations, the compiler generates calls to custom assembly functions.
How It Works
- Multiplication, division, and modulus are not hardware-supported.
- These operations are implemented as software routines (assembly functions).
- When the compiler encounters
*,/, or%in the AST, it generates a function call to these routines. - These routines are generated in the beggining of the code generation step
Algorithms Used
| Operation | Algorithm |
|---|---|
| Multiplication | Shift-and-add multiplication |
| Division & Modulus | Restoring division algorithm |
To support function calls, local variables, recursion and parameter passing, the compiler implements a stack-based execution model that follows the ABI (Application Binary Interface).
- The stack grows downward, toward lower memory addresses.
- R3 – Stack Pointer (SP) always points to the top of the stack.
- R2 – Frame Pointer (FP) marks the base address of the current stack frame.
- Each function call creates a new stack frame, enabling recursion and nested calls.
- Global variables are stored at the bottom of RAM, starting from 0x00.
- They are persistent throughout program execution and do not reside on the stack.
|
|
Function Call Sequence
- Caller pushes arguments onto the stack.
- Caller pushes the return address.
- Current FP is saved, and FP is updated to SP.
- Callee reserves space for local variables (SP = SP – size).
- On return:
- Local variables are freed (SP restored).
- Previous FP and return address are restored.
- Control returns to the caller.
This stack design ensures:
- Support for recursion and nested calls
- Separation of local and global data
- ABI-compliant parameter passing and return values
- A caller–callee structure for code generation, prologues and epilogues
- Parse the root node
- If both childs are terminals, the code is generated
- If not then we parse the non terminal childs (left first)
-
VeSPA ALU only allows immediate values on the right operand (Source B), so it was established that nodes with an immediate left child are handled with an add instruction, by negating the immediate value and the result of the sum, as shown below.
42 - x = -(x + (-42)) -
Operations with 2 immediates will not exist because we applied constant folding, so they dont need to be handled in this stage.
To generate VeSPA assembly, a set of code generation templates was defined.
Each template specifies how a particular language construct (such as if, while, assignments, or function calls) is emitted into assembly.
Below is an example of the template used for the if statement, demonstrating how conditional evaluation, jump labels, and control flow are generated.
All the other templates can be found in the report.
sudo apt update
sudo apt install build-essential flex bison makeThis project provides a Makefile that can build individual components or the full compiler.
Available targets:
make setup— create theOutput/directorymake lexer— generate the lexer (Output/lex.yy.c) using Flexmake parser— generate the parser (Output/Parser.tab.c/.h) using Bison (warnings suppressed)make src— compile sources into the executableOutput/RocketCmake all— run: setup → parser → lexer → compilemake prog— run the compiler onTestCodeGen.cmake clean— remove the entireOutput/directorymake run— clean, rebuild everything, then run the test (TestCodeGen.c)
Build everything:
make all
Two ways to test the compiler.
- Using your own C file:
./Output/RocketC <SourceFile.c> --parse-
Using the provided test file in project root (
TestCodeGen.c):make allmake prog
Or, to force a clean rebuild and then run:
make run
After running the compiler:
- The generated assembly is written to:
Output/Rocket.asm - If the program uses
*,/, or%, the compiler inserts predefined multiplication/division/modulo routines at the top of the assembly file.- These routines use
RL(rotate left) andRR(rotate right), since the architecture lacks native multiply/divide instructions. - They are only included when needed.
- These routines use
- The terminal output also includes:
- The Abstract Syntax Tree (AST)
- Symbol tables
- Any semantic errors or warnings
See VeSPA Simulator / Virtual Machine here
See VeSPA Assembler here
See VeSPA Instruction Scheduler here
- Implemented parts of the Lexical Analysis (Lexer)
- Helped develop the Parser and Abstract Syntax Tree (AST), including grammar definitions and AST structure.
- Took a leading role in the Code Generation phase — one of my main contributions.
- Designed and implemented most of the Stack Management — also one of my main contributions.
- While I did not implement semantic analysis, working on code generation required a solid understanding of semantics, symbol resolution, and scope handling.
- Participated in discussions on compiler–hardware trade-offs to ensure compatibility with the VeSPA architecture.
- Contributed to the technical documentation and explanation of design decisions.
- Compiler Architecture Design
- C Programming and Modular Software Design
- Debugging
- Hardware/Compiler Trade-off Evaluation
- Problem Solving
- Team Collaboration
- Documentation













