Repo contains documentation and assembler for VPU assembly, as well as test programs. THe scripts here are messy and works in progress, at some point they will get a rewrite with a better design (and probably not in python).
Language is designed to benefit the hardware as much as possible, therefore making it a pretty bad and inexpressive ASM dialect. Assembler is python for ease of development and not really needing performance currently due to the size of test programs.
instructions.py parses ISA definition YAMLS and generates code definitions (e.g. definition headers for the C++ simulator). It is also used as a library by the assembler. See an update-to-date version of the ISA with ./instructions.py instructions.yaml --table. CMakeLists.txt calls this automatically when building.
assembler.py is the assembler for the VPU instruction set. It refers to instructions.py for reading the ISA definition, and converts assembly inputs into a memory binary that the simulator reads. ./assembler.py test_programs/inc.asm will compile the program inc.asm and produce vpu.out. The option --debug generates an additional debug file which the simulator picks up and uses to gain additional context.
For programs which need model data, provide the path to the binary blob with the --data argument. Multiple blobs can be provided. If an assembly program needs a blob and one is not provided then the assembler will error.
Blobs must be in the voxel stream format generated by the encoder. This format encodes voxel position and colour information in a format that is relatively optimised for hardware parsing.
Hardware currently assumes 512-bytes of addressible memory. Reset vector is 0x0, soft-limit for program size is 4kB because I wanted to pick an arbitrary small number. No reason this can't increase later.
Load-store style architecture. The core is not really intended to be doing much computation, rather to set up kicks for dedicated hardware.
Registers:
- 8 GP regs
- ACC reg
- PC
- O (overflow) flag
- C (compare) flag
ACC (accumulator) reg is the input and target of all ALU operations, as well as source and dest for memory operations.
Core instructions are directly executed on the CPU. All core instructions are 32-bit.
Byte 1 is the instruction, by convention bit zero is set to indicate a hardware kick and unset for a CPU instruction.
Bytes 2, 3, and 4 are operands, and can be combined for larger constants.
User instructions.py to generate a markdown table of instructions based on the yaml file, if required.
HW instructions offload operations to dedicated hardware pipelines. Hardware cores operate off control streams (semi-arbitrary instructions stored in memory). These can be pre-assembled before execution (and just pointed to by the ASM program) or can be manually assembled at runtime, but this is probably a bad idea. Therefore hardware kicks just need to run a pipeline with a control stream base address. A hardware kick instruction is non-blocking, they can run in parallel. For simplicity, only one kick of a single hardware pipeline can be in progress at any one time.
An exception to the above is the MMU, which provides several DMA operations the CPU can use without a control stream. DMA operations are blocking. Additionally, the BLOCK instruction halts execution until all active kicks have finished. DMA instructions are prepended by D, and hardware kicks by H.
| Name | Operands | Flags | Description |
|---|---|---|---|
| D.CP | REGD, REGS, REG_SIZE | DMA copy REG_SIZE bytes starting from the address in REGS to the address in REGD | |
| D.SET | REGD, REGV, REG_SIZE | DMA set REG_SIZE bytes starting from the address in REGD to the value in REG_SIZE | |
| H.RUN | REG_CS | Run test pipeline on control stream address held in REG_CS | |
| BLOCK | Wait until the completion of all kicks has been signalled |
TODO: move above table into yaml