Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update architecture diagrams #1215

Merged
merged 1 commit into from
Jun 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 68 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,74 +17,94 @@ Want to contribute? See [Getting Started](https://mc-stan.org/stanc3/stanc/getti
for setup instructions and some useful commands.

## High-level concepts, invariants, and 30,000-ft view
Stanc3 has 3 main src packages: `frontend`, `middle`, and `stan_math_backend`.
Stanc3 has 4 main src packages: `frontend`, `middle`, `analysis_and_optimization` and `stan_math_backend`.

![top-level stanc3 architecture](docs/img/architecture.png)
```mermaid
flowchart
Stanc --> Frontend & Analysis & Backend <-.-> Middle
```

The Middle contains the MIR and currently any types or functions used by the two ends.
The goal is to keep as many details about the way Stan is implemented by the core C++ implementation in the Stan Math backend library as possible.
The Middle library contains the MIR and currently any types or functions used by the two ends.
The entrypoint for the compiler is in `src/stanc/stanc.ml` which sequences the various components together.

### Distinct stanc Phases

The phases of stanc are summarized in the following information flowchart and list.
<!---
digraph G {
rankdir=TB;
ranksep=.25;
bgcolor=white;
size=5;
node [shape="box"];

origin[style=invis];
stanc[label="stanc/stanc.ml"];
lexer[label="frontend/lexer.mll"];
parser[label="frontend/parser.mly"];
type[label="frontend/Typechecker.ml"];
lower[label="frontend/Ast_to_Mir.ml"];
transform[label="*_backend/Transform_Mir.ml"];
optimize[label="analysis_and_optimization/Optimize.ml"];
codegen[label="*_backend/*_code_gen.ml"];
output[shape="oval" label=".hpp file"]


origin -> stanc[label=" .stan file path"];
stanc -> lexer[label=" string"];
lexer -> parser[label=" tokens"];
parser -> type[label=" untyped AST"];
type -> lower[label=" typed AST"];
lower -> transform[label=" MIR"];
transform -> optimize[label=" transformed MIR"];
transform -> codegen[label=" "];
optimize -> codegen[headlabel="optimized MIR "];
codegen -> output[label=" C++ code"];

}
--->
![stanc3 information flow](docs/img/information-flow.png)
```mermaid
flowchart TB

subgraph frontend[Frontend]
direction TB
infile>Source file]
lexer(frontend/lexer.mll)
parser(frontend/parser.mly)
typecheck(frontend/Typechecker.ml)
lower(frontend/Ast_to_Mir.ml)

infile --> lexer -->|Tokens| parser
parser -->|Untyped AST| typecheck -->|Typed AST| lower
end


subgraph middle[Middle Representation]
data{{MIR Data Structures}}
end

subgraph analysis[Static Analysis and Optimization]
optimize(analysis_and_optimization/Optimize.ml)
end

subgraph backend[Backend]
codegen(*_backend/*_code_gen.ml)
transform(*_backend/Transform_Mir.ml)

transform -.->|MIR with backend specific code| optimize
transform --> codegen
optimize -->|Optimized MIR| codegen
end

outfile>Output File, e.g. a .hpp]

middle --- analysis
frontend ==> middle =====> backend ==> outfile


click lexer "https://github.com/stan-dev/stanc3/blob/master/src/frontend/lexer.mll"
click parser "https://github.com/stan-dev/stanc3/blob/master/src/frontend/parser.mly"
click typecheck "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Typechecker.ml"
click lower "https://github.com/stan-dev/stanc3/blob/master/src/frontend/Ast_to_Mir.ml"
click optimize "https://github.com/stan-dev/stanc3/blob/master/src/analysis_and_optimization/Optimize.ml"
click data "https://github.com/stan-dev/stanc3/tree/master/src/middle"
click codegen "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Stan_math_code_gen.ml"
click transform "https://github.com/stan-dev/stanc3/blob/master/src/stan_math_backend/Transform_Mir.ml"
```

1. [Lex](src/frontend/lexer.mll) the Stan language into tokens.
1. [Parse](src/frontend/parser.mly) Stan language into AST that represents the syntax quite closely and aides in development of pretty-printers and linters. `stanc --debug-ast` to print this out.
1. Typecheck & add type information [Typechecker.ml](src/frontend/Typechecker.ml). `stanc --debug-decorated-ast`
1. [Lower](src/frontend/Ast_to_Mir.ml) into [Middle Intermediate Representation](src/middle/Mir.ml) (AST -> MIR) `stanc --debug-mir` (or `--debug-mir-pretty`)
1. [Lower](src/frontend/Ast_to_Mir.ml) into [Middle Intermediate Representation](src/middle/Program.ml) (AST -> MIR) `stanc --debug-mir` (or `--debug-mir-pretty`)
1. Backend-specific MIR transform (MIR -> MIR) [Transform_Mir.ml](src/stan_math_backend/Transform_Mir.ml) `stanc --debug-transformed-mir`
1. Analyze & optimize (MIR -> MIR)
1. Backend MIR transform (MIR -> MIR) [Transform_Mir.ml](src/stan_math_backend/Transform_Mir.ml) `stanc --debug-transformed-mir`
1. Hand off to a backend to [emit C++](src/stan_math_backend/Stan_math_code_gen.ml) (or LLVM IR, or Tensorflow, or interpret it!).
1. Code generation (MIR -> [C++](src/stan_math_backend/Stan_math_code_gen.ml)) (or other outputs, like [Tensorflow](https://github.com/stan-dev/stan2tfp/)).

### The two central data structures

1. `src/frontend/Ast.ml` defines the AST. The AST is intended to have a direct 1-1 mapping with the syntax, so there are things like parentheses being kept around.
The pretty-printer in the frontend uses the AST and attempts to keep user syntax the same while just adjusting whitespace.

The AST uses a particular functional programming trick to add metadata to the AST (and its other tree types), sometimes called [the "two-level types" pattern](http://lambda-the-ultimate.org/node/4170#comment-63836). Essentially, many of the tree variant types are parameterized by something that ends up being a placeholder not for just metadata but for the recursive type including metadata, sometimes called the fixed point. So instead of recursively referencing `expression` you would instead reference type parameter `'e`, which will later be filled in with something like `type expr_with_meta = metadata expression`.
The AST intends to keep very close to Stan-level semantics and syntax in every way.
2. `src/middle/Mir.ml` contains the MIR (Middle Intermediate Language - we're saving room at the bottom for later). `src/frontend/Ast_to_Mir.ml` performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.
The AST uses a particular functional programming trick to add metadata to the AST (and its other tree types), sometimes called [the "two-level types" pattern](http://lambda-the-ultimate.org/node/4170#comment-63836). Essentially, many of the tree variant types are parameterized by something that ends up being a placeholder not for just metadata but for the recursive type including metadata, sometimes called the fixed point. So instead of recursively referencing `expression` you would instead reference type parameter `'e`, which will later be filled in with something like `type expr_with_meta = metadata expression`.

The AST intends to keep very close to Stan-level semantics and syntax in every way.

2. `src/middle/Program.ml` contains the MIR (Middle Intermediate Language - we're saving room at the bottom for later). `src/frontend/Ast_to_Mir.ml` performs the lowering and attempts to strip out as much Stan-specific semantics and syntax as possible, though this is still something of a work-in-progress.

The MIR uses the same two-level types pattern to add metadata, notably expression types and autodiff levels as well as locations on many things. The MIR is used as the output data type from the frontend and the input for dataflow analysis, optimization (which also outputs MIR), and code generation.
The MIR uses the same two-level types idea to add metadata, notably expression types and autodiff levels as well as locations on many things. The MIR is used as the output data type from the frontend and the input for dataflow analysis, optimization (which also outputs MIR), and code generation.

## Design goals
* **Multiple phases**, each with human-readable intermediate representations for easy debugging and optimization design.
* **Multiple phases** - each with human-readable intermediate representations for easy debugging and optimization design.
* **Optimizing** - takes advantage of info known at the Stan language level. Minimize information we must teach users for them to write performant code.
* **Holistic-** bring as much of the code as possible into the MIR for whole-program optimization.
* **Research platform-** enable a new class of optimizations based on probability theory.
* **Holistic** - bring as much of the code as possible into the MIR for whole-program optimization.
* **Research platform** - enable a new class of optimizations based on probability theory.
* **Modular** - architect & build in a way that makes it easy to outsource things like symbolic differentiation to external libraries and to use parts of the compiler as the basis for other tools built around the Stan language.
* **Simplicity first -** When making a choice between correct simplicity and a perceived performance benefit, we want to make the choice for simplicity unless we can show significant (> 5%) benchmark improvements to compile times or run times. Premature optimization is the root of all evil.
* **Simplicity first** - When making a choice between correct simplicity and a perceived performance benefit, we want to make the choice for simplicity unless we can show significant (> 5%) benchmark improvements to compile times or run times. Premature optimization is the root of all evil.
Binary file modified docs/img/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/information-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.