The width of memory addresses, most ALU operands and general registers is 32 bits.
Memory access instructions usually support a granularity of 8, 16 and 32 bits.
Memory accesses are little-endian.
Instructions are also encoded as little-endian.
Writes to memory that the instruction pointer may reach may not be reflected immediately.
Ways to mitigate this are currently implementation-defined.
RAM address ranges are implementation defined.
Out-of-range memory accesses are undefined behavior.
16-bit and 32-bit memory accesses must be aligned. Instructions should be aligned to 16-bit boundaries.
Whether all registers are 0-initialized on reset is implementation-defined.
Certain registers are always 0-initialized on reset: rip
.
The instruction pointer initializes in an implementation-defined manner (typically 0x0
).
TODO: this should not really be the case, memory mapping will make this make more sense
"Specific-purpose" registers may usually still be used as general purpose registers in specific contexts.
Volatile registers may be altered by functions in the ABI.
To preserve the value of volatile registers, the caller must save them.
If a register is used as an argument for an ABI call, then the caller is free to
mutate it as if it were volatile.
Mnemonic | Encoding | Use or ABI meaning | Volatile |
---|---|---|---|
General purpose registers | |||
r0 |
0x0 |
ABI: Arg #0 or vararg count; return register | Yes |
r1 |
0x1 |
ABI: Arg #1 | Yes |
r2 |
0x2 |
ABI: Arg #2 | Yes |
r3 |
0x3 |
ABI: Arg #3 | Yes |
r4 |
0x4 |
ABI: Arg #4 | Yes |
r5 |
0x5 |
ABI: Arg #5 | If arg |
r6 |
0x6 |
ABI: Arg #6 | If arg |
r7 |
0x7 |
No | |
r8 |
0x8 |
No | |
r9 |
0x9 |
No | |
r10 |
0xA |
No | |
r11 |
0xB |
No | |
r12 |
0xC |
No | |
Specific-purpose registers | |||
rret |
0xD |
Return address from jump-and-link | Yes |
rpl |
0xE |
Literal pool start address | No |
rps |
0xF |
Stack pointer | No |
Certain special registers cannot be addressed directly, but may be manipulated and used by certain instructions.
Name | Width | Meaning |
---|---|---|
T |
1 bit | Test bit |
rip |
32-bit | Instruction pointer |
The standard way to use the stack is to initialize it to the highest address. The stack grows downwards in the address space.
Stack pushes are available as an instruction, but pops are not for implementation difficulty reasons.
push(r5)
is functionally equivalent to:
addi(rps, -4),
s32(rps, r5)
pop(r5)
would have to be implemented as:
lu32(rps, r5),
add_i(rps, 4)
The stack should be set up in a consistent state before interrupts (TODO) are enabled.
When an interrupt fires, the CPU will retire all instructions in the pipeline.
After this, registers will be dumped to the stack. As such, memory below the
stack pointer should be expected to be mutated at any time.
No delay slots are used for branching instructions.
Thus, the instruction executed immediately after a successful branch will be the
targeted IP.
Excluding instruction cache miss penalties, the cost of misprediction is of 1 cycle.
Conditional branches and loads read the special T
bit register.
The T
bit is manipulated by certain arithmetic and test instructions.
Instr[8:] | Format | Mnemonic | Description | Pseudocode |
---|---|---|---|---|
Loads | ||||
00000000 |
R4W4 | l8(addr:R4, dst:W4) |
Load u8 from memory | dst <- u32(mem8(addr)) |
00000001 |
R4W4 | l16(addr:R4, dst:W4) |
Load u16 from memory | dst <- u32(mem16(addr)) |
00000010 |
R4W4 | l32(addr:R4, dst:W4) |
Load u32 from memory | dst <- mem32(addr) |
00000011 |
R4W4 | c_lr(src:R4, dst:W4) |
Conditionally load register | if T { dst <- src } |
00000100 |
R4W4E16 | l8ow(base:R4, dst:W4, off:E16) |
Load u8 with offset (wide) ±32K | dst <- u32(mem8(base + s32(imm))) |
00000101 |
R4W4E16 | l16ow(base:R4, dst:W4, off:E16) |
Load u16 with offset (wide) ±64K | dst <- u32(mem16(base + s32(imm) << 1)) |
00000110 |
R4W4E16 | l32ow(base:R4, dst:W4, off:E16) |
Load u32 with offset (wide) ±128K | dst <- mem32(base + s32(imm) << 2)) |
00000111 |
R4W4 | lr(src:R4, dst:R4) |
Load register | dst <- src |
00001000 |
R4W4 | ls8(addr:R4, dst:W4) |
Load s8 from memory | dst <- s32(mem8(addr)) |
00001001 |
R4W4 | ls16(addr:R4, dst:W4) |
Load s16 from memory | dst <- s32(mem16(addr)) |
00001010 |
R4W4E16 | ls8ow(base:R4, dst:W4, off:E16) |
Load s8 with offset (wide) ±32K | dst <- s32(mem8(base + u32(imm))) |
00001011 |
R4W4E16 | ls16ow(base:R4, dst:W4, off:E16) |
Load s16 with offset (wide) ±64K | dst <- s32(mem16(base + u32(imm) << 1)) |
000011-- |
Rh2W2I6 | l8o(base:Rh2, dst:W2, off:I6) |
Load u8 with offset <=63 | dst <- u32(mem8(base + u32(imm))) |
000100-- |
Rh2W2I6 | l16o(base:Rh2, dst:W2, off:I6) |
Load u16 with offset <=126 | dst <- u32(mem16(base + u32(imm) << 1)) |
000101-- |
Rh2W2I6 | l32o(base:Rh2, dst:W2, off:I6) |
Load u32 with offset <=189 | dst <- mem32(base + u32(imm) << 2)) |
000110-- |
Rh2W2I6 | ls8o(base:Rh2, dst:W2, off:I6) |
Load u8 with offset <=63 | dst <- s32(mem8(base + u32(imm))) |
000111-- |
Rh2W2I6 | ls16o(base:Rh2, dst:W2, off:I6) |
Load u16 with offset <=126 | dst <- s32(mem16(base + u32(imm) << 1)) |
0010---- |
W4I8 | lsi(imm:I8, dst:W4) |
Load from s8 immediate | dst <- s32(imm) |
0011---- |
W4I8 | lsih(imm:I8, dst:W4) |
Load from s8 immediate to high byte | dst[24:31] <- u8(imm) |
0100---- |
W4I8E16 | lsiw(imm:I24, dst:W4) |
Load from s24 immediate (wide) | dst <- s32(imm) |
0101---- |
W4I8E16 | liprel(imm:I24, dst:W4) |
Load ip-relative address ±16MB | r0 <- rip + 2 + s32(dst << 1) |
Stores | ||||
01100000 |
R4R4 | s8(addr:R4, src:R4) |
Store u8 to memory | mem8(addr) <- src[:7] |
01100001 |
R4R4 | s16(addr:R4, src:R4) |
Store u16 to memory | mem16(addr) <- src[:15] |
01100010 |
R4R4 | s32(addr:R4, src:R4) |
Store u32 to memory | mem32(addr) <- src |
01100011 |
R4 | push(src:R4) |
Push to rps | rps <- rps - 4; mem32(rps) <- src |
01100100 |
R4R4E16 | s8ow(base:R4, src:R4, imm:E16) |
Store u8 with offset (wide) ±32K | mem8(addr + s32(imm)) <- src[:7] |
01100101 |
R4R4E16 | s16ow(base:R4, src:R4, imm:E16) |
Store u16 with offset (wide) ±64K | mem16(addr + s32(imm) << 1)) <- src[:15] |
01100110 |
R4R4E16 | s32ow(base:R4, src:R4, imm:E16) |
Store u32 with offset (wide) ±128K | mem32(addr + s32(imm) << 2)) <- src |
01100111 |
brk |
Implementation-defined break | N/A | |
011010-- |
Rh2R2I6 | s8o(base:Rh2, src:R2, off:I6) |
Store u8 with offset <= 63 | mem8(base + u32(imm)) <- src[:7] |
011011-- |
Rh2R2I6 | s16o(base:Rh2, src:R2, off:I6) |
Store u16 with offset <= 126 | mem16(base + u32(imm)) <- src[:15] |
011100-- |
Rh2R2I6 | s32o(base:Rh2, src:R2, off:I6) |
Store u32 with offset <= 189 | mem32(base + u32(imm)) <- src |
Tests and T -bit manipulation |
||||
01110100 |
R4R4 | tltu(a:R4, b:R4) |
Test if lower than (unsigned) | T <- (a < b) |
01110101 |
R4R4 | tlts(a:R4, b:R4) |
Test if lower than (signed) | T <- (s32(a) < s32(b)) |
01110110 |
R4R4 | tgeu(a:R4, b:R4) |
Test if greater or equal (unsigned) | T <- (a >= b) |
01110111 |
R4R4 | tges(a:R4, b:R4) |
Test if greater or equal (signed) | T <- (s32(a) >= s32(b)) |
01111000 |
R4R4 | te(a:R4, b:R4) |
Test if equal to | T <- (a == b) |
01111001 |
R4R4 | tne(a:R4, b:R4) |
Test if not equal to | T <- (a != b) |
01111010 |
R4R4 | tgtu(a:R4, b:R4) |
Test if lower than (unsigned) | T <- (a > b) |
01111011 |
R4R4 | tgts(a:R4, b:R4) |
Test if lower than (signed) | T <- (s32(a) > s32(b)) |
01111100 |
R4I4 | tltsi(a:R4, b:I4) |
Test if lower than signed imm. | T <- (s32(a) < s32(b)) |
01111101 |
R4I4 | tgesi(a:R4, b:I4) |
Test if greater or equal (signed) | T <- (s32(a) >= s32(b)) |
01111110 |
R4I4 | tei(a:R4, b:I4) |
Test if equal to immediate | T <- (a == b) |
01111111 |
R4I4 | tnei(a:R4, b:I4) |
Test if not equal to immediate | T <- (a != b) |
Pool loads | ||||
1000---- |
W4I8 | pl_l32(dst:R4, imm:U8) |
rpl: Load u32 from memory with off <=1K | dst <- mem32(rpl + (u32(imm) << 2)) |
Branching and conditional ops | ||||
10010000 |
R4 | j(addr:R4) |
Jump unconditionally | rip <- addr |
10010001 |
R4 | c_j(addr:R4) |
Conditionally jump | if T { RIP <- addr } |
10010010 |
R4W4 | jal(addr:R4, target:R4) |
Jump and link | ret <- rip + 2; rip <- addr |
hole | ||||
1010---- |
I12E16 | jali(ipoff:I28) |
Jump and link to immediate ±128M | ret <- rip + 2; rip <- rip + 2 + s32(ipoff << 1) |
1011---- |
I12 | c_ji(ipoff:I12) |
Conditionally jump with IP-relative imm. | if T { rip <- rip + 2 + s32(ipoff << 1) } |
Arithmetic and bitwise logic | ||||
11000000 |
W4R4 | bsext8(dst: W4, a:R4) |
Sign-extend from 8 to 32 | dst <- sign extend s8(a) |
11000001 |
W4R4 | bsext16(dst: W4, a:R4) |
Sign-extend from 16 to 32 | dst <- sign extend s16(a) |
11000010 |
W4R4 | bzext8(dst: W4, a:R4) |
Zero-extend from 8 to 32 | dst <- zero extend s8(a) |
11000011 |
W4R4 | bzext16(dst: W4, a:R4) |
Zero-extend from 16 to 32 | dst <- zero extend s16(a) |
11000100 |
W4R4 | ineg(dst:W4, a:R4) |
Integer negative of value | dst <- (-s32(a)) |
11000101 |
A4R4 | isub(dst:A4, b:R4) |
Integer subtract | dst <- dst - b |
11000110 |
A4R4 | iadd(dst:A4, b:R4) |
Integer add | dst <- dst + b |
11000111 |
A4I4 | iaddsi(dst:A4, b:I4) |
Integer add signed immediate | dst <- dst + s32(b) |
11001000 |
A4R4E16 | iaddsiw(dst:A4, a:R4, b:E16) |
Integer add signed immediate (wide) | dst <- a + s32(b) |
11001001 |
A4I4 | iaddsi_tnz(dst:A4, b:I4) |
Integer add signed imm. then test for non-zero | dst <- dst + s32(b); T <- (dst != 0) |
11001010 |
A4R4 | band(dst:A4, b:R4) |
Bitwise AND | dst <- dst & b |
11001011 |
A4R4 | bor(dst:A4, b:R4) |
Bitwise OR | dst <- dst | b |
11001100 |
A4R4 | bxor(dst:A4, b:R4) |
Bitwise XOR | dst <- dst ^ b |
11001101 |
A4R4 | bsl(dst:A4, b:R4) |
Bitwise shift left | dst <- dst << b[:5] |
11001110 |
A4R4 | bsr(dst:A4, b:R4) |
Bitwise shift right (pads 0 s) |
dst <- dst >> b[:5] |
11001111 |
A4R4 | basr(dst:A4, b:R4) |
Integer arith. shift right (pads sign) | dst <- dst >>> b[:5] |
1101000- |
A4I5 | bsli(dst:A4, b:I5) |
Bitwise shift left with immediate | dst <- dst << b |
1101001- |
A4I5 | bsri_tlsb((dst:A4, b:I5) |
Bitwise shift right with immediate | dst <- dst >> b; T <- (dst & 0b1) != 0 |
1101010- |
A4I5 | basri(dst:A4, b:I5) |
Integer arith. shift right with imm. | dst <- dst >>> b |
11010110 |
hole | |||
11010111 |
hole | |||
11011--- |
hole | |||
11100000 |
intoff |
Set interrupts off | rinton <- 1 |
|
11100001 |
inton |
Set interrupts on | rinton <- 0 |
|
11100010 |
intret |
Interrupt handler return | rip <- rintret; rinton <- 1 |
|
11100011 |
intwait |
Interrupt wait (sleep until interrupt) | ||
1111---- |
hole |
Most instructions are encoded in 16 bits. The instruction type can always be decoded with the first 8 bits.
Certain instructions may encode immediates at least partly over the 16 or 32
bits following the instruction.
Each 16 bits of extra immediates, assuming no cache miss, imply +1 cycle of
latency.
In the following table:
x
are bits ignored for a specific instruction format.o
are bits belonging to the opcode.(h)
(optional) stands for High meaning that the most significant bit is to be set when indexing a register (whenx < 4 bits
), e.g.R2
indexes(r0, r1, r2, r3)
Rh2
indexes(r8, r9, r10, r11)
Bits belonging to neither the opcode or the instruction format must be 0
.
When it comes to naming:
R(h)x
refers to a read-only register encoded over x bitsA(h)x
refers to a read-write register encoded over x bitsW(h)x
refers to a write-only register encoded over x bitsIx
refers to an immediate value encoded over x bits-x
refers to unused bits in the instructionEx
refers to extra x immediate bits encoded starting atrip+2
. These bits are always skipped when fetching the next instruction. Immediates present in the main instructionu16
represent the higher bits if present, whereasEx
will represent the lower bits.
In the following table, A
and W
are always replaced with R
as the
encodings would be otherwise equivalent.
Format | Operands | Next two bytes | Instruction bits |
---|---|---|---|
R4 | a | x | oooo'oooo'xxxx'aaaa |
R4R4 | a, b | x | oooo'oooo'bbbb'aaaa |
R4I4 | a, i | x | oooo'oooo'iiii'aaaa |
R4I8 | a, i | x | oooo'iiii'iiii'aaaa |
R4I8E16 | a, i | iiii'iiii'iiii'iiii |
oooo'iiii'iiii'aaaa |
R4R4E16 | a, b, i | iiii'iiii'iiii'iiii |
oooo'oooo'bbbb'aaaa |
Rh2R2I6 | a, b, i | x | oooo'ooii'iiii'bbaa |
I12 | i | x | oooo'iiii'iiii'iiii |
I12E16 | i | iiii'iiii'iiii'iiii |
oooo'iiii'iiii'iiii |
0x00000000
..0xEFFFFFFF
: RAM0xF0000000
..0xF0000FFF
: Keyboard0xF0002000
..0xF0002FFF
: Framebuffer
TODO
Enables paletted text rendering.
0x2000
..0x2F9F
:FbChar[25][80]
- framebuffer data0x2FA0
..0x2FCF
:RgbColor[16]
- palette data0x2FD0
: any write causes a vsync wait
FbChar bit layout:
- 0..6: ASCII char.
- 7: Bold.
- 8..11: Foreground palette entry.
- 12..15: Background palette entry.
RgbColor bit layout:
- 0..7: Red channel.
- 8..15: Green channel.
- 16..23: Blue channel.
TODO
16 interrupts are available.
ID | Description |
---|---|
0x0 |
Processor exception |
0xC |
Timer interrupt |
0xD |
Sound buffer empty event |
0xE |
Keyboard event |
0xF |
Framebuffer event (vsync) |
The CPU boots with interrupts disabled. The inton
instruction will enable them
and intoff
will disable them.
When an interrupt is fired:
- Interrupts are disabled
rip
is saved to the specialrintret
register- The CPU jumps to
0x00001000 + interrupt_id * 16
The 16 bytes size of interrupt handlers leave <=8 instructions for software to save state and jump to a more complex handler or to early return.
ISRs must return by using the reti
instruction.
If an interrupt is pending right as reti
is being executed, the
CPU will immediately branch to the relevant ISR.
If no interrupt is pending, then reti
will:
- Jump the CPU to
rintret
(as set when the interrupt was fired) - Re-enable interrupts
On its own, the interrupt mechanism does not affect the stack as pointed to by
rps
. However, the ABI requirements makes it legal for the ISR to push and pop
from the stack as long as rps
is back to its original value during the
intret
.
At the moment, nested interrupts are unsupported. Re-enabling interrupts within
the ISR is possible but the user code's rret
value would be lost.
Software exceptions are implemented in terms of interrupts and uses ID 0x0
.
Currently, different exceptions cannot be differentiated by the system.