Skip to content

mobarski/vimes2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 25, 2024
d87fd25 · Feb 25, 2024
Feb 25, 2024
Feb 12, 2024
Feb 18, 2024
Feb 25, 2024
Feb 25, 2024
Feb 16, 2024
Feb 19, 2024
Feb 12, 2024
Jan 19, 2024
Feb 25, 2024
Feb 25, 2024

Repository files navigation

Vimes2 - Virtual Machines Experimentation Sandbox 2

Another take on my Vimes project.

Vimes is a collection of virtual machines (VMs) and related resources for studying their performance, ease of implementation, and ease of use. This sandbox includes a variety of VMs with different architectures, dispatch techniques and implementations as well as benchmarks and utilities to help measure and compare their characteristics.

Warning: this is experimental / pre-alpha code.

Key Takeaways

Development:

  • VM instruction sets are easy to implement, as each instruction is relatively simple.

  • The biggest investments (so far) when creating a new VM are:

    • Transforming assembly into machine code.
    • Loading machine code into the VM.
    • Creating a buffered reader and an int parser for stdin.
    • Creating CLI.
    • Creating tests and benchmarks.
  • The ability to define aliases for values (like n=1) and mnemonics (like peek=lpa) is a huge step forward in assembly UX with very little changes in the assembler (1 token is still 1 token).

  • The ability to define multi-token aliases facilitates keeping the instruction set orthogonal (ie you can quickly add stacks with peek, poke, inc, dec and they can either grow up or down).

  • Having all 3 conditional jumps related to comparison (<0, >0, ==0) makes the code much easier to write and read.

  • Having array like access (ptr+offset) vs just pointers is a huge step forward in the assembly UX.

Performance:

  • 🔥 benchmarking results 🔥

  • Wirth's machine is 2-3 times slower than register machines without stack frames.

  • C enables more performant dispatch techniques, such as indirect and direct threading.

  • Indirect and direct threading are twice as fast as switch-based dispatch.

  • Indirect threading seems to be the best approach when writing a VM in C (10% slower but no code remapping).

  • The performance of C programs compiled with Zig is similar to that of programs compiled with gcc (±20%).

  • Nim's {.computedGoto.} pragma resulted in 10% slower code.

  • Translation of machine code into C results in extremely fast execution but the return from the procedure call is a bit tricky.

  • Zig acting as a C compiler outshines gcc in optimization of the translated machine code.

VM versions

  • mk1 - machine from Wirth's 1976 book Algorithms + Data Structures = Programs

    • mk2 - mk1 with variable number of arguments, swapped a and l
      • mk3 - internal bytecode version of mk2

        • abandoned as bytecode requires more work than fixed-width words - instructions variants and assembler changes (🛑)
      • mk4 - switch call threading version of mk2

      • mk5 - indirect call threading version of mk2

  • mk6 - register based vm inspired by smol

    • mk11 - similar to mk6 but conditional jumps are based on acc register
      • mk13 - similar to mk11 but closer to the smol language
    • mk12 - one operand version ok mk6
    • mkXX- mk6 with stack frame similar to mk1 (🌱)
  • mk7 - register based vm inspired by Human Resource Machine

    • mk7c - mk7 implemented in C
    • mk7ci - mk7 implementation in C, indirect threading
    • mk7cd - mk7 implementation in C, direct threading
    • mk7cc - mk7 asm compiled to C code (🚧)
    • mk8 - mk7 extended with pointer operations, subroutine call/return and one more conditional jump (🏆)
      • mk8c - mk8 implemented in C
      • mk8ci - mk8 implemented in C, indirect threading
      • mk8cd - mk8 implemented in C, indirect threading
      • mk8cc - mk8 asm compiled to C code (🚧)
    • mk9 - two operands version of mk7
      • mk10 - mk9 extended with pointer operations and subroutine call/return
        • mkXX - mk10 with better UX (🌱)
    • mkXX - mk7 extended with cooperative multitasking instructions (🌱)

VM registers / variable names

mk1 - mk5

These machines use the same variable names as Wirth's example p-code machine from 1976 (available here).

  • p - program register

  • b - base register

  • t - topstack register

  • s - stack

  • i - instruction register

  • a - argument register

  • l - level register

  • code - program memory

mk6+

  • pc - program counter
  • sp - stack pointer
  • acc - accumulator / main register of the VM
  • code - program storage memory
  • mem - main memory
  • stack - return stack

VM instructions

mk1 - mk5

instructions

- LIT a   ; load constant (a)
- INT a   ; increment t-register by (a)
- LOD a b ; load variable (a) from level (b)
- STO a b ; store variable (a) at level (b)
- CAL a b ; call procedute (a) at level (b)
- JMP a   ; jump to (a)
- JPC a   ; jump conditional to (a)
- OPR a   ; execute operation (a) ie OPR ADD
- EX1 a   ; execute operation (a) from VM extension 1
- EX2 a   ; execute operation (a) from VM extension 2
- EX3 a   ; execute operation (a) from VM extension 3
  ...
- HLT     ; halt the program

operations

- ADD ; (ab--c)  c = a + b
- SUB ; (ab--c)  c = a - b
- MUL ; (ab--c)  c = a * b
- DIV ; (ab--c)  c = 
- RET ; (ab--c)  c = 
- NEG ; (a--b)   b = -a
- ODD ; (a--b)   b = a % 2
- MOD ; (ab--c)  c = a % b
- EQ  ; (ab--c)  c = 1 if a==b else 0
- NE  ; (ab--c)  c = 1 if a!=b else 0
- LT  ; (ab--c)  c = 1 if a<b  else 0
- LE  ; (ab--c)  c = 1 if a<=b else 0
- GT  ; (ab--c)  c = 1 if a>b  else 0
- GE  ; (ab--c)  c = 1 if a>=b else 0

The notation (ab--c) describes the stack effect of an operation. It indicates that the operation expects two items to be on the stack before execution, referred to as a and b, and after the operation is executed, these items are replaced by a single item c on the stack. The items before -- are consumed (popped) from the stack, and the items after -- are produced (pushed) onto the stack.

extensions

Extension 1 - stdtio:
- PUTI ; (a--) 
- GETI ; (--a) 
- EOF  ; (--a) 
- PUTC ; (a--) 
- GETC ; (--a) 

Extension 2 - ALU - bit ops
- AND ; (ab--c) 
- OR  ; (ab--c) 
- XOR ; (ab--c) 
- NOT ; (a--b)  
- SHL ; (ab--c) 
- SHR ; (ab--c) 
- SAR ; (ab--c) 

Extension 3 - ALU - common ops
- INC ; (a--)   
- DEC ; (a--)   
- EQZ ; (ab--c) 
- NEZ ; (ab--c) 
- LTZ ; (ab--c) 
- LEZ ; (ab--c) 
- GTZ ; (ab--c) 
- GEZ ; (ab--c) 

mk6 instruction set

- LIT  a b  ; set memory location (a) to literal value (b)
- MOV  a b  ; set memory location (a) to value from memory location (b)
- PEEK a b  ; set memory location (a) with value from memory location indicated by (b)
- POKE a b  ; set memory location indicated by (a) to value from memory location (b)

- JMP  a 0  ; jump to program location (a)
- JZ   a b  ; if memory location (a) is zero then jump to program location (b)
- JNZ  a b  ; if memory location (a) is not zero then jump to program location (b)
- CAL  a 0  ; call subroutine at program location (a)
- RET  0 0  ; return from subroutine call
- HLT  0 0  ; halt the program

- ADD  a b  ; mem[a] = mem[a] + mem[b]
- SUB  a b  ; mem[a] = mem[a] - mem[b]
- MUL  a b  ; mem[a] = mem[a] * mem[b]
- DIV  a b  ; mem[a] = mem[a] / mem[b]
- MOD  a b  ; mem[a] = mem[a] % mem[b]
- NEG  a 0  ; mem[a] = -mem[a]

- EQ   a b  ; mem[a] = 1 if mem[a] == mem[b] else 0
- NE   a b  ; mem[a] = 1 if mem[a] != mem[b] else 0
- LT   a b  ; mem[a] = 1 if mem[a] <  mem[b] else 0
- LE   a b  ; mem[a] = 1 if mem[a] <= mem[b] else 0
- GT   a b  ; mem[a] = 1 if mem[a] >  mem[b] else 0
- GE   a b  ; mem[a] = 1 if mem[a] >= mem[b] else 0

- PUTC a 0  ; write the value from memory location (a) to stdout (as character)
- PUTI a 0  ; write the value from memory location (a) to stdout (as integer)
- GETC a 0  ; read a character from stdin and store it in memory location (a)
- GETI a 0  ; read an integer from stdin and store it in memory location (a), skip initial whitespaces
- EOF  a 0  ; set memory location (a) to 1 if stdin indicates end-of-file or to 0 otherwise

mk7 instruction set

- IN  0  ; read input to ACC
- OUT 0  ; write ACC to output
- LDA a  ; load memory location (a) to ACC
- STA a  ; store ACC in memory location (a)
- ADD a  ; add memory location (a) to ACC
- SUB a  ; subtract memory location (a) from ACC
- INC a  ; increase memory location (a) by 1
- DEC a  ; decrease memory location (a) by 1
- JMP a  ; jump to address (a)
- JZ  a  ; jump to address (a) if ACC is zero
- JN  a  ; jump to address (a) if ACC is negative
- LIT a  ; load (a) to ACC
- HLT 0  ; halt the program

mk8 instruction set

mk7 instructions extended with:

- CAL a ; call procedure at address (a)
- RET 0 ; return from procedure
- LPA a ; load memory location pointed by (a) to ACC
- SPA a ; store ACC in memory location pointed by (a)

misc instructions:

- ASR a ; arithmetic shift right ACC by (a)
- NOP a ; do nothing, (a) can be used to mark labels

mk9 instruction set

- IN  a 0 ; read input to mem[a]
- OUT a 0 ; write mem[a] to output
- LIT a b ; mem[a] = b
- MOV a b ; mem[a] = mem[b]
- ADD a b ; mem[a] = mem[a] + mem[b]
- SUB a b ; mem[a] = mem[a] - mem[b]
- JMP a 0 ; jump to address (a)
- JZ  a b ; jump to address (a) if mem[b] is zero
- JN  a b ; jump to address (a) if mem[b] is negative
- HLT 0 0 ; halt the program

mk10 instruction set

mk9 instructions extended with

- CAL a 0 ; call procedure at address (a)
- RET 0 0 ; return from procedure
- PTM a b ; transfer from pointer (b) to memory location (a)
- MTP a b ; transfer from memory location (b) to pointer (a)
- ASR a b ; arithmetic shift right mem[a] by (b)
- NOP a b ; do nothing, (a) can be used to mark labels
- JP  a b ; jump to address (a) if mem[b] is positve (>0)

mk11 instruction set

- LIT  a b  ; mem[a] = b
- MOV  a b  ; mem[a] = mem[b]
- LDA  a 0  ; acc = mem[a] + b
- LDAP a b  ; acc = mem[mem[a]+mem[b]]
- STA  a 0  ; mem[a] = acc + b
- STAP a b  ; mem[mem[a]+mem[b]] = acc

- JMP  a 0  ; jump to program location (a)
- CAL  a 0  ; call subroutine at (a)
- RET  0 0  ; return from subroutine call
- HLT  0 0  ; halt the program

- ADD  a b  ; mem[a] = mem[a] + mem[b]
- SUB  a b  ; mem[a] = mem[a] - mem[b]
- MUL  a b  ; mem[a] = mem[a] * mem[b]
- DIV  a b  ; mem[a] = mem[a] / mem[b]
- MOD  a b  ; mem[a] = mem[a] % mem[b]
- NEG  a 0  ; mem[a] = -mem[a]

- CMP  a b  ; acc = mem[a] - mem[b] (compare)
- JEQ  a 0  ; jump to (a) if acc == 0
- JLT  a 0  ; jump to (a) if acc < 0
- JGT  a 0  ; jump to (a) if acc > 0
- JNE  a 0  ; jump to (a) if acc != 0
- JLE  a 0  ; jump to (a) if acc <= 0
- JGE  a 0  ; jump to (a) if acc >= 0

- GET  a 0  ; mem[a] = read integer from stdin (skip whitespace, block)
- PUT  a 0  ; write mem[a] as integer to stdout
- EOF  a 0  ; mem[a] = 1 if stdid.eof else 0
- GETC a 0  ; mem[a] = read character from stdin (block)
- PUTC a 0  ; write mem[a] as character to stdout

mk12 instruction set

- LIT  a  ; acc = a
- LDA  a  ; acc = mem[a]
- STA  a  ; mem[a] = acc
- PEEK a  ; acc = mem[mem[a]]
- POKE a  ; mem[mem[a]] = acc

- JMP  a  ; jump to program location (a)
- CAL  a  ; call subroutine at (a)
- RET  0  ; return from subroutine call
- HLT  0  ; halt the program

- EQ   a  ; acc = 1 if acc == mem[a] else 0
- NE   a  ; acc = 1 if acc != mem[a] else 0
- LT   a  ; acc = 1 if acc <  mem[a] else 0
- LE   a  ; acc = 1 if acc <= mem[a] else 0
- GT   a  ; acc = 1 if acc >  mem[a] else 0
- GE   a  ; acc = 1 if acc >= mem[a] else 0
- JZ   a  ; jump to (a) if acc == 0 
- JNZ  a  ; jump to (a) if acc != 0

- ADD  a  ; acc += mem[a]
- SUB  a  ; acc -= mem[a]
- MUL  a  ; acc *= mem[a]
- DIV  a  ; acc /= mem[a]
- MOD  a  ; acc %= mem[a]
- NEG  0  ; acc = -acc
- INC  a  ; mem[a] += 1
- DEC  a  ; mem[b] -= 1

- PUT  0  ; write acc as integer to stdout
- GET  0  ; acc = read integer from stdin (skip whitespace, block)
- EOF  0  ; acc = 1 if stdid.eof else 0
- PUTC 0  ; write acc as character to stdout
- GETC 0  ; acc = read character from stdin (block)

mk13 instruction set

- LIT  a b  ; mem[a] = b
- MOV  a b  ; mem[a] = mem[b]
- LDA  a b  ; acc = mem[a] + b
- LDAP a b  ; acc = mem[mem[a]+mem[b]]
- STA  a b  ; mem[a] = acc + b
- STAP a b  ; mem[mem[a]+mem[b]] = acc

- JMP  a 0  ; jump to program location (a)
- CAL  a 0  ; call subroutine at (a)
- RET  0 0  ; return from subroutine call
- HLT  0 0  ; halt the program

- ADD  a b  ; mem[a] = mem[a] + mem[b]
- SUB  a b  ; mem[a] = mem[a] - mem[b]
- MUL  a b  ; mem[a] = mem[a] * mem[b]
- DIV  a b  ; mem[a] = mem[a] / mem[b]
- MOD  a b  ; mem[a] = mem[a] % mem[b]
- NEG  a 0  ; mem[a] = -mem[a]

- EQ   a b  ; acc = 1 if mem[a] == mem[b] else 0
- NE   a b  ; acc = 1 if mem[a] != mem[b] else 0
- LT   a b  ; acc = 1 if mem[a] <  mem[b] else 0
- LE   a b  ; acc = 1 if mem[a] <= mem[b] else 0
- GT   a b  ; acc = 1 if mem[a] >  mem[b] else 0
- GE   a b  ; acc = 1 if mem[a] >= mem[b] else 0
- JZ   a 0  ; jump to (a) if acc == 0 
- JNZ  a 0  ; jump to (a) if acc != 0

- GET  a 0  ; mem[a] = read integer from stdin (skip whitespace, block)
- PUT  a 0  ; write mem[a] as integer to stdout
- EOF  a 0  ; mem[a] = 1 if stdid.eof else 0
- GETC a 0  ; mem[a] = read character from stdin (block)
- PUTC a 0  ; write mem[a] as character to stdout

Reference materials