Skip to content

Commit ef93e56

Browse files
authored
Merge pull request #17 from mrLSD/feat/register-number
Feat: extend SemanticStack with registers
2 parents 5ad44c2 + c662618 commit ef93e56

File tree

10 files changed

+429
-122
lines changed

10 files changed

+429
-122
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "semantic-analyzer"
3-
version = "0.2.6"
3+
version = "0.3.0"
44
authors = ["Evgeny Ukhanov <[email protected]>"]
55
description = "Semantic analyzer library for compilers written in Rust for semantic analysis of programming languages AST"
66
keywords = ["compiler", "semantic-analisis", "semantic-alalyzer", "compiler-design", "semantic"]

README.md

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,56 +11,56 @@
1111
Semantic analyzer is an open source semantic analyzer for programming languages
1212
that makes it easy to build your own efficient compilers.
1313

14-
## What is the library for and what tasks does it solve
14+
## 🌀 What is the library for and what tasks does it solve
1515

1616
Creating a compilers for a programming language is process that involves several key
1717
stages. Most commonly it is:
1818

19-
- **Lexical Analysis (Lexer)**: This stage involves breaking down the input stream
19+
▶️ **Lexical Analysis (Lexer)**: This stage involves breaking down the input stream
2020
of characters into a series of tokens. Tokens are the atomic elements of the programming language, such as identifiers, keywords, operators, etc.
2121

22-
- **Syntax Analysis (Parsing)**: At this stage, the tokens obtained in the previous
22+
▶️ **Syntax Analysis (Parsing)**: At this stage, the tokens obtained in the previous
2323
stage are grouped according to the grammar rules of the programming language. The result
2424
of this process is an **Abstract Syntax Tree (AST)**, which represents a hierarchical structure of the code.
2525

26-
- **Semantic Analysis**: This stage involves checking the semantic correctness of the code. This can include
26+
**Semantic Analysis**: This stage involves checking the semantic correctness of the code. This can include
2727
type checking, scope verification of variables, etc.
2828

29-
- **Intermediate Code Optimization**: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient.
29+
▶️ **Intermediate Code Optimization**: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient.
3030
This can include dead code elimination, expression simplification, etc.
3131

32-
- **Code Generation**: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into
32+
▶️ **Code Generation**: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into
3333
machine code specific to the target architecture.
3434

3535
This library represent **Semantic Analysis** stage.
3636

37-
### Features
37+
### 🌻 Features
3838

39-
- **Name Binding and Scope Checking**: The analyzer verifies that all variables, constants, functions are declared before they're used,
39+
**Name Binding and Scope Checking**: The analyzer verifies that all variables, constants, functions are declared before they're used,
4040
and that they're used within their scope. It also checks for name collisions, where variables, constants, functions, types in the same scope have the same name.
4141

42-
- **Checking Function Calls**: The analyzer verifies that functions are called with the number of parameters and that the type of
42+
**Checking Function Calls**: The analyzer verifies that functions are called with the number of parameters and that the type of
4343
arguments matches the type expected by the function.
4444

45-
- **Scope Rules**: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.
45+
**Scope Rules**: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.
4646

47-
- **Type Checking**: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings.
47+
**Type Checking**: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings.
4848
For operations in expressions. It is the process of verifying that the types of expressions are consistent with their usage in the context.
4949

50-
- **Flow Control Checking**: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly.
50+
**Flow Control Checking**: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly.
5151
Supported condition expressions and condition expression correctness check.
5252

53-
- **Building the Symbol Table**: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of
53+
**Building the Symbol Table**: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of
5454
symbols (variables, functions, constants) in the source code. Each entry in the symbol table contains the symbol's name, type, and scope related for block state, and other relevant information.
5555

56-
### Semantic State Tree
56+
### 🌳 Semantic State Tree
5757

5858
The result of executing and passing stages of the semantic analyzer is: **Semantic State Tree**.
5959

6060
This can be used for Intermediate Code Generation, for further passes
6161
semantic tree optimizations, linting, backend codegen (like LLVM) to target machine.
6262

63-
#### Structure of Semantic State Tree
63+
#### 🌲 Structure of Semantic State Tree
6464

6565
- **blocks state** and related block state child branches. It's a basic
6666
entity for scopes: variables, blocks (function, if, loop).
@@ -87,7 +87,7 @@ However, parent elements cannot access child elements, which effectively limits
8787

8888
All of that source data, that can be used for Intermediate Representation for next optimizations and compilers codegen.
8989

90-
### Subset of programming languages
90+
### 🧺 Subset of programming languages
9191

9292
The input parameter for the analyzer is a predefined
9393
AST (abstract syntax tree). As a library for building AST and the only dependency
@@ -104,4 +104,12 @@ analysis and source code parsing, it is recommended to use: [nom is a parser com
104104

105105
AST displays the **Turing complete** programming language and contains all the necessary elements for this.
106106

107+
## 🛋️ Examples
108+
109+
- 🔎 There is the example implementation separate project [💾 Toy Codegen](https://github.com/mrLSD/toy-codegen).
110+
The project uses the `SemanticStack` results and converts them into **Code Generation** logic. Which clearly shows the
111+
possibilities of using the results of the `semantic-analyzer-rs` `SemanticStackContext` results. LLVM is used as a
112+
backend, [inkwell](https://github.com/TheDan64/inkwell) as a library for LLVM codegen, and compiled into an executable
113+
program. The source of data is the AST structure itself.
114+
107115
## MIT [LICENSE](LICENSE)

src/semantic.rs

Lines changed: 72 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -533,18 +533,26 @@ impl State {
533533
params.push(expr_result);
534534
}
535535

536+
// Result of function call is stored to register
537+
body_state.borrow_mut().inc_register();
538+
let last_register_number = body_state.borrow().last_register_number;
536539
// Store always result to register even for void result
537-
body_state.borrow_mut().context.call(func_data, params);
540+
body_state
541+
.borrow_mut()
542+
.context
543+
.call(func_data, params, last_register_number);
538544
Some(fn_type)
539545
}
540546

541547
/// # condition-expression
542-
/// Analyse condition operations.
548+
/// Analyse condition operations.
549+
/// ## Return
550+
/// Return result register of `condition-expression` calculation.
543551
pub fn condition_expression(
544552
&mut self,
545553
data: &ast::ExpressionLogicCondition<'_>,
546554
function_body_state: &Rc<RefCell<BlockState>>,
547-
) {
555+
) -> u64 {
548556
// Analyse left expression of left condition
549557
let left_expr = &data.left.left;
550558
let left_res = self.expression(left_expr, function_body_state);
@@ -553,12 +561,15 @@ impl State {
553561
let right_expr = &data.left.right;
554562
let right_res = self.expression(right_expr, function_body_state);
555563

556-
let (Some(left_res), Some(right_res)) = (left_res, right_res) else {
557-
return;
564+
// If some of the `left` or `right` expression is empty just return with error in the state
565+
let (Some(left_res), Some(right_res)) = (left_res.clone(), right_res.clone()) else {
566+
self.add_error(error::StateErrorResult::new(
567+
error::StateErrorKind::ConditionIsEmpty,
568+
format!("left={left_res:?}, right={right_res:?}"),
569+
data.left.left.location(),
570+
));
571+
return function_body_state.borrow().last_register_number;
558572
};
559-
// Unwrap result only after analysing
560-
// let left_res = left_res?;
561-
// let right_res = right_res?;
562573

563574
// Currently strict type comparison
564575
if left_res.expr_type != right_res.expr_type {
@@ -567,7 +578,7 @@ impl State {
567578
left_res.expr_type.to_string(),
568579
data.left.left.location(),
569580
));
570-
return;
581+
return function_body_state.borrow().last_register_number;
571582
}
572583
match left_res.expr_type {
573584
Type::Primitive(_) => (),
@@ -577,29 +588,46 @@ impl State {
577588
left_res.expr_type.to_string(),
578589
data.left.left.location(),
579590
));
580-
return;
591+
return function_body_state.borrow().last_register_number;
581592
}
582593
}
583594

595+
// Increment register
596+
function_body_state.borrow_mut().inc_register();
597+
598+
let register_number = function_body_state.borrow_mut().last_register_number;
584599
// Codegen for left condition and set result to register
585600
function_body_state
586601
.borrow_mut()
587602
.context
588-
.condition_expression(left_res, right_res, data.left.condition.clone().into());
603+
.condition_expression(
604+
left_res,
605+
right_res,
606+
data.left.condition.clone().into(),
607+
register_number,
608+
);
589609

590610
// Analyze right condition
591611
if let Some(right) = &data.right {
612+
let left_register_result = function_body_state.borrow_mut().last_register_number;
592613
// Analyse recursively right part of condition
593-
self.condition_expression(&right.1, function_body_state);
614+
let right_register_result = self.condition_expression(&right.1, function_body_state);
594615

616+
// Increment register
617+
function_body_state.borrow_mut().inc_register();
618+
619+
let register_number = function_body_state.borrow_mut().last_register_number;
595620
// Stategen for logical condition for: left [LOGIC-OP] right
596621
// The result generated from registers, and stored to
597622
// new register
598-
function_body_state
599-
.borrow_mut()
600-
.context
601-
.logic_condition(right.0.clone().into());
623+
function_body_state.borrow_mut().context.logic_condition(
624+
right.0.clone().into(),
625+
left_register_result,
626+
right_register_result,
627+
register_number,
628+
);
602629
}
630+
function_body_state.borrow_mut().last_register_number
603631
}
604632

605633
/// # If-condition body
@@ -793,18 +821,20 @@ impl State {
793821
// If condition contains logic condition expression
794822
ast::IfCondition::Logic(expr_logic) => {
795823
// Analyse if-condition logic
796-
self.condition_expression(expr_logic, if_body_state);
824+
let result_register = self.condition_expression(expr_logic, if_body_state);
797825
// State for if-condition-logic with if-body start
798826
if is_else {
799-
if_body_state
800-
.borrow_mut()
801-
.context
802-
.if_condition_logic(label_if_begin.clone(), label_if_else.clone());
827+
if_body_state.borrow_mut().context.if_condition_logic(
828+
label_if_begin.clone(),
829+
label_if_else.clone(),
830+
result_register,
831+
);
803832
} else {
804-
if_body_state
805-
.borrow_mut()
806-
.context
807-
.if_condition_logic(label_if_begin.clone(), label_if_end.clone());
833+
if_body_state.borrow_mut().context.if_condition_logic(
834+
label_if_begin.clone(),
835+
label_if_end.clone(),
836+
result_register,
837+
);
808838
}
809839
}
810840
}
@@ -1169,18 +1199,21 @@ impl State {
11691199
ast::ExpressionValue::ValueName(value) => {
11701200
// Get value from block state
11711201
let value_from_state = body_state.borrow_mut().get_value_name(&value.name().into());
1202+
// Register contains result
1203+
body_state.borrow_mut().inc_register();
1204+
let last_register_number = body_state.borrow().last_register_number;
11721205
// First check value in body state
11731206
let ty = if let Some(val) = value_from_state {
11741207
body_state
11751208
.borrow_mut()
11761209
.context
1177-
.expression_value(val.clone());
1210+
.expression_value(val.clone(), last_register_number);
11781211
val.inner_type
11791212
} else if let Some(const_val) = self.global.constants.get(&value.name().into()) {
11801213
body_state
11811214
.borrow_mut()
11821215
.context
1183-
.expression_const(const_val.clone());
1216+
.expression_const(const_val.clone(), last_register_number);
11841217
const_val.constant_type.clone()
11851218
} else {
11861219
// If value doesn't exist in State or as Constant
@@ -1192,7 +1225,6 @@ impl State {
11921225
return None;
11931226
};
11941227
// Return result as register
1195-
body_state.borrow_mut().inc_register();
11961228
ExpressionResult {
11971229
expr_type: ty,
11981230
expr_value: ExpressionResultValue::Register(
@@ -1274,10 +1306,14 @@ impl State {
12741306
})?
12751307
.clone();
12761308

1277-
body_state
1278-
.borrow_mut()
1279-
.context
1280-
.expression_struct_value(val.clone(), attributes.clone().attr_index);
1309+
// Register contains result
1310+
body_state.borrow_mut().inc_register();
1311+
let last_register_number = body_state.borrow().last_register_number;
1312+
body_state.borrow_mut().context.expression_struct_value(
1313+
val.clone(),
1314+
attributes.clone().attr_index,
1315+
last_register_number,
1316+
);
12811317

12821318
body_state.borrow_mut().inc_register();
12831319
ExpressionResult {
@@ -1303,14 +1339,17 @@ impl State {
13031339
// Do not fetch other expression flow if type is wrong
13041340
return None;
13051341
}
1342+
// Expression operation is set to register
1343+
body_state.borrow_mut().inc_register();
1344+
let last_register_number = body_state.borrow().last_register_number;
13061345
// Call expression operation for: OP(left_value, right_value)
13071346
body_state.borrow_mut().context.expression_operation(
13081347
op.clone().into(),
13091348
left_value.clone(),
13101349
right_value.clone(),
1350+
last_register_number,
13111351
);
13121352
// Expression result value for Operations is always should be "register"
1313-
body_state.borrow_mut().inc_register();
13141353
ExpressionResult {
13151354
expr_type: right_value.expr_type,
13161355
expr_value: ExpressionResultValue::Register(

src/types/error.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ pub enum StateErrorKind {
2626
TypeNotFound,
2727
WrongReturnType,
2828
ConditionExpressionWrongType,
29+
ConditionIsEmpty,
2930
ConditionExpressionNotSupported,
3031
ForbiddenCodeAfterReturnDeprecated,
3132
ForbiddenCodeAfterContinueDeprecated,

0 commit comments

Comments
 (0)