This project contains interpreters for the programming languages Loop, Goto & While that are based on Prof Dr. Heribert Vollmer's lecture notes for his introductory course in theoretical computer science. They are modeled to feel similar to common imperative programming languages but they are delibaretly greatly simplified. Furthermore, when possible, functions to transform source code from one language to another are provided, too.
I created a live web evaluation service where you can write some code and have it either evaluated or transformed. Additionally, to keep yourself entertained, you can find some exercises on that website, too.
Execute runtests.sh
to run the tests.
-
Install the Haskell platform. This will give you GHC and the cabal-install build tool.
-
Clone this project and build it with
cabal
:git clone git://github.com/eugenkiss/loopgotowhile.git cd loopgotowhile-site cabal install
Now you can use this project as a library in Haskell source files.
The purpose of these languages in Prof Dr. Heribert Vollmer's introductory course in theoretical computer science was to provide formal semantics for the specification of algorithms. Moreover, the languages were used to show the equivalence of turing complete languages. There was no executable implementation of these languages to my knowledge. Frankly, why should there have been? Their purpose didn't require that. However, I thought it would be an interesting exercise to create exectuable implementations for these languages. Even more so as I always wanted to create an executable implementation of a programming language and it was the perfect opportunity to get to know Haskell better.
Most of all, I wanted to see how elegantly the problem of creating parsers, evaluators and "transformators" for these languages could be expressed in Haskell. And I'm quote pleased with the result although I acknowledge that there is much to improve especially regarding the code duplication for transforming each language to its strict subset.
In reality there aren't three languages but six; each language is given in a strict and extended version. When being evaluated a language is first transformed into its strict subset - as it is easier to write an evaluator for the strict subset - and only then (indirectly) evaluated.
Only strict Goto and strict While are evaluated directly. Loop is evaluated by first transforming it to While and then evaluating that While program. Although it would suffice to only have one directly evaluated language this is not done due to performance reasons. It is much faster to directly evaluate Goto as opposed to evaluating Goto through While.
Shared functionality is outsourced to the "Shared" namespace in order to reduce code duplication and thus reuse more code.
I know that much can be improved. If you have suggestions then contact me, please.
Note: Some of the "to-strict-transformation"-tests don't use a hand-written strict program to test the transformations against but instead they test against a simplified extended program that is automatically transformed to a strict program. The reason for this decision is that a strict program would simply be too long and too convoluted to comprehend the applied transformations. Nonetheless, these tests are not useless as the preceding, more primitive tests "ensure" that the transformations of the simplified extended version to a strict version will be correct.
-
Remove code duplication especially regarding the code duplication for transforming each language to its strict subset. Also, try to reduce code duplication for While & Goto evaluation.
-
Some parts are quite ugly (see TODO's in the source code). Beautify!
-
Customization of Error Messages in Parsec leaves much to be desired.
-
Improve output of pretty printers (e.g. dismiss superfluous parentheses).
-
Unify naming convetions (e.g. if qualifed import AS than GotoAS not Goto)
-
Use more type system constraints
- e.g.
GotoStat = Label GotoStat | ...
. This definition e.g. allowsLabel (Label ...)
, which is actually not allowed in the Goto definition. If I could enforce the type to only allow nesting of 1 Level, then the compiler would have saved me really early from a bug that I accidentelly discovered really late... - The same is true for e.g.
GotoStat = Seq [GotoStat] | ...
. It shouldn't be possible to defineSeq (Seq ...)
as that is the same asSeq ...
and some functions rely on the latter "normalized" form. It is error prone to depend on not forgetting to normalize the data. Much better would be an early compiler error message.
- e.g.
-
Command line tools for each language
- Options
- -p, --prettyprint
- -s, --strict
- -t [loop,goto,while], --transform [loop,goto,while] (list only those that are possible)
- -o, --output (default: stdout)
- Extract common functionality to Shared
- Create executables in cabal
- Options
-
Optimizations
- Create benchmark suite
- Maybe improve efficiency of arithmetic functions (e.g. is / reduction to strict code efficient or is it possible to make the algorithm faster)
- Maybe write evaluator for extended While/Goto so that operations like x0 := x0 + x1 are one instruction and not a loop.
- Maybe JIT?
- Maybe use hashtable implementation of Map in order to speed up lookup of variables?
- Other optimizations? Loop invariants (analysis)?
-
Function Definitions
Test duplication/redundancy could be greatly reduced if at first code duplication is reduced. This way, tests could be reused on the one hand and on the other hand a lot of "to-strict-transformation" tests for While and Goto could be removed since it would suffice to test Loop's "to-strict-transformations" thouroughly if all languages shared (almost) the same code.
You can only implement functions whose codomain are the natural numbers and
whose domain are the natural numbers to the power of n, where n is the
number of arguments. Also, every variable is by default initialized to 0
and
arguments, if any, are stored in x1,...,xn
. That means, providing the
argument list 4,87,3
is the same as prepending x1 := 4; x2 := 87, x3 := 3
to the top of your program. Also, note that each language is case sensitive!
An exemplary (extended and pointless) Loop program looks as follows:
x4 := abc + y * 4;
IF hg != 7 || hg != 7 THEN
LOOP x4 - (x4 + 1) DO
x0 := 1
END
ELSE
x0 := counter
END
The syntactical components are
- Variables:
x0
,x1
,x2
,x3
,x4
,... - Constants:
0
,1
,2
,3
,4
,... - Operators:
+
and-
- Separators:
;
and:=
- Keywords:
LOOP
,DO
andEND
Let xi
and xj
be variables and let c
be a constant. Then
xi := xj + c
and
xi := xj - c
are Loop programs.
Let P1
and P2
be Loop programs. Then
P1; P2
is a Loop Program.
Let P
be a Loop program and let xi
be a variable. Then
LOOP xi DO P END
is a Loop Program.
Let P
be a Loop Program. P
computes a function f: ℕ^k → ℕ like so: At the
beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found in
the variables x1,...,xk. All other variables have the starting value 0. P
is executed as follows:
- By executing the program
xi := xj + c
xi
is assigned the value ofxj + c
. - By executing the program
xi := xj - c
xi
is assigned the value ofxj - c
if the value is non-negative. Otherwisexi
is assigned 0. - By executing the program
P1; P2
at firstP1
is executed and after thatP2
. - The execution of the program
LOOP xi DO P' END
happens as follows: The ProgramP'
is executed as often as the value ofxi
has been at the Beginning. That means, assignments toxi
insideP'
do not influence the number of repetitions.
The result of P
's execution is the value of x0
or put in another way
f(n1,...,nk) = Value of x0 after execution.
A function f: ℕ^k → ℕ is called Loop-computable if there exists a Loop program that computes f as described above.
Variables can be named arbitrarily. The only restrictions are that they begin
with a letter and are not a keyword, e.g. counter4
is a valid variable
identifier. Apart from that the following Loop programs are all valid and have
the respective intuitive semantics:
xi := xj
(stands forxi := xj + 0
)xi := c
(stands forxi := xj + c
wherexj
is an unused variable)xi := aexp
whereaexp
is an arbitrary arithmetic expression consisting of variables, constants, (optional) parantheses and operators (+
,-
,*
,/
,^
and%
) like for exampley * (hg + 8 / x8) - 2 % 4
.LOOP aexp DO P END
whereaexp
is an arbitrary arithmetic expression.IF xi = 0 THEN P END
IF !(xi < xj && !(xk != 3)) THEN P1 ELSE P2 END
IF bexp THEN P END
wherebexp
is an arbitrary "boolean" expression consisting of variables, constants, (optional) parantheses, relational operators (<
,<=
,>
,>=
,=
and!=
) and boolean operators (&&
,||
and!
).IF bexp THEN P1 ELSE P2 END
Also, you are allowed to insert comments in your source code. The syntax is
similar to Java's comment syntax, i.e. //
introduces a comment
whose scope ends at the next line and /* */
can be used for
multiline comments.
An exemplary (extended and pointless) Goto program looks as follows:
M1: x4 := abc + y * 4;
Cool: IF hg != 7 || hg != 7 THEN GOTO M1 ELSE HALT END;
GOTO Cool;
M42: HALT
A Goto program is a succession
M1: A1;
M2: A2;
⋮
Mk: Ak
where Mi
is a so called label and Ai
is an instruction. Possible
instructions are:
- Value assignment:
xi := xj ± c
(xi
,xj
are variables,c
is a constant) - Jump:
GOTO Mi
- Conditional Jump:
IF xi = c THEN GOTO Mj
- Stop instruction:
HALT
The last instruction is either a stop instruction or a jump.
The execution of a Goto program starts with the first instruction. The execution of instructions of each type is as follows:
xi := xj ± c
: The value ofxi
becomesxj ± c
and the next instruction is executed.GOTO Mi
: Proceed with the instruction with labelMi
.IF xi = c THEN GOTO Mj
: If the value ofxi
is equal toc
proceed with the instruction with labelMj
. Otherwise the next instruction is executed.HALT
: Stop the execution of the program.
A jump to a label that is not existent in the program is not defined.
Let P
be a Goto Program. P
computes a function f: ℕ^k → ℕ like so: At the
beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found in
the variables x1,...,xk. All other variables have the starting value 0.
f(n1,...,nk) is the value of x0
after execution if P
terminates.
Otherwise f(n1,...,nk) is undefined.
A function f: ℕ^k → ℕ is called Goto-computable if there exists a Goto program that computes f as described above.
The program doesn't have to end with a jump or a stop instruction. Labels can
be named arbitrarily or can be omitted altogether. Note, tough, that labels
must be unique. IF
statements must be completed with the lexeme END
because
they can contain several statements. Furthermore, a HALT
statement may appear
in the body of an IF
.
Apart from that, all extensions from the Loop language - except for the LOOP
construct which is not present in the extended Goto language - apply to the
extended Goto language.
The While language is an extension of the Loop language. An exemplary (extended and pointless) While program looks as follows:
x4 := abc + y * 4;
IF hg != 7 || hg != 7 THEN
WHILE !(x4 = x4 + 1) DO
x0 := 1
END
ELSE
x0 := counter
END
Apart from the LOOP
construct which is not part of the While language the
syntax is the same as that of Loop. Additionally, a new keyword (WHILE
) with
the accompanying syntactical construct, namely the WHILE
loop, is introduced.
Let P
be a While program and let xi
be a variable. Then
WHILE xi != 0 DO P END
is a While program.
The execution of WHILE xi != 0 DO P END
happens so, that the program P
is
executed as long as the value of xi
is not equal to 0.
Let P
be a While Program. P
computes a function f: ℕ^k → ℕ like so: At
the beginning of the computation the arguments n1,...,nk ∈ ℕ are to be found
in the variables x1,...,xk. All other variables have the starting value 0.
f(n1,...,nk) is the value of x0
after execution if P
terminates.
Otherwise f(n1,...,nk) is undefined.
A function f: ℕ^k → ℕ is called While-computable if there exists a While program that computes f as described above.
Apart from the LOOP
related extensions - since the While language has no
LOOP
construct - the Loop extensions are all valid While extensions.
Additionally, the head of a WHILE
loop can have an arbitrary boolean
expression, e.g. WHILE xyz != 9 || u = 8 DO P END
is a valid (extended) While
program.