Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
vighneshiyer committed Nov 27, 2022
0 parents commit b18427d
Show file tree
Hide file tree
Showing 66 changed files with 414,592 additions and 0 deletions.
35 changes: 35 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.metals*
.bloop*
.bsp*
.vscode*
.idea*
project/*
target/*
*.hex
*.mif
test_run_dir/
build/
netwkdev/data*
netwkdev/mnist*
netwkdev/smnist*
netwkdev/pretrained*
netwkdev/kwsonsnn/__pycache__*
domaintest*

mapping/meminit
mapping/networkData_fp.json
cocotb/testbench/sim_build
__pycache__
results.xml

a.out
*.vcd
*.ivl
*.vcs
csrc
*.daidir
ucli.key
*.vpd
*.fsdb
novas_dump.log
.nfs*
309 changes: 309 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# Simulation Command API for Fast Multithreaded RTL Testbenches

SimCommand is a library for writing multi-threaded high-performance RTL testbenches in Scala.
It is primarily designed for testing circuits [written in Chisel](https://github.com/chipsalliance/chisel3), but can also be used with [Chisel's Verilog blackboxes](https://www.chisel-lang.org/chisel3/docs/explanations/blackboxes.html) to test any single-clock synchronous Verilog RTL.
This library depends on [chiseltest](https://www.chisel-lang.org/chiseltest/) ([repo](https://github.com/ucb-bar/chiseltest)), which is a Scala library for interacting with RTL simulators (including treadle, Verilator, and VCS).

## Docs

### A Simple Example

Let's test this simple Chisel circuit of a register sitting between a 32-bit input and output.
```scala
import chisel3._
class Register extends Module {
val in = IO(Input(UInt(32.W)))
val out = IO(Output(UInt(32.W)))
out := RegNext(in)
}
```

You can use the chiseltest `test` function to elaborate the `Register` circuit, compile an RTL simulation, and get access to a handle to the DUT:
```scala
import chiseltest._
import org.scalatest.flatspec.AnyFlatSpec
class RegisterTester extends AnyFlatSpec with ChiselScalatestTester {
test(new Register()) { dut =>
// testing code here
}
}
```

The core datatype of SimCommand is `Command[R]` which is a *description* of an interaction with the DUT.
The type parameter `R` is the type of the value that a `Command` will terminate with.
There are several user functions that can be used to construct a `Command` such as `peek(signal)`, `poke(signal, value)` and `step(numCycles)`.

```scala
import chiseltest._
import simcommand._
test(new Register()) { dut =>
val poker: Command[Unit] = poke(dut.in, 100.U)
val stepper: Command[Unit] = step(cycles=1)
val peeker: Command[UInt] = peek(dut.out)
}
```

Note that constructing the `peeker`, `poker`, and `stepper` `Command`s doesn't actually do anything - each of these values just *describes* a simulator interaction, but doesn't perform it.
This is in contrast to chiseltest's `poke`, `peek` and `step` functions which *eagerly perform* their associated actions.

Note that `poke` and `step` both return `Command[Unit]` which indicates that they terminate with `(): Unit` (since they have no information to return to the testbench).
In contrast, `peek` returns `Command[I]` where `I` is the type of the signal being peeked.
In this example, `I` is a Chisel `UInt`: a hardware unsigned integer.

To actually run any of these commands, we have to explicitly call `unsafeRun` which calls the underlying command in chiseltest.
```scala
val poker: Command[Unit] = poke(dut.in, 100.U)
val dummy1: Unit = unsafeRun(poker, dut.clock)

val stepper: Command[Unit] = step(cycles=1)
val dummy2: Unit = unsafeRun(stepper, dut.clock)

val peeker: Command[UInt] = peek(dut.out)
val value: UInt = unsafeRun(peeker, dut.clock)
val correctBehavior = value.litValue == 100
```

Of course, this is tedious, so we want a way to describe running multiple `Command`s sequentially so that we can call `unsafeRun` only once at the very end of our testbench description.

### Chaining Commands

`Command[R]` has two functions defined on it:
- `flatMap[R2](f: R => Command[R2]): Command[R2]` which allows one to 'unwrap' the `R` from a `Command[R]` and continue with another `Command[R2]`
- `map[R2](f: R => R2): Command[R2]` which maps the inner value of type `R` to a value of type `R2` via `f`

Let's use these functions to chain the `Command`s from the previous example into a single `Command`, which terminates with a Boolean which is true if the circuit behaved correctly.
```scala
val program: Command[Boolean] =
poke(dut.in, 100.U).flatMap { _: Unit =>
step(1).flatMap { _: Unit =>
peek(dut.out).map { value: UInt =>
value.litValue == 100
}
}
}
val correctBehavior: Boolean = unsafeRun(program, dut.clock)
assert(correctBehavior)
```

Notice how `flatMap` is used to 'extract' the return value from a `Command` and follow it up with another `Command`.
The inner-most call to `peek` is followed by a `map` which extracts the return value of the peek and evaluates a function to return a `Boolean`.

In Scala, for-comprehensions are syntactic sugar for expressing nested calls to `flatMap` followed by a final call to `map`.
The code above can be expressed like this:
```scala
val program: Command[Boolean] = for {
_ <- poke(dut.in, 100.U)
_ <- step(1)
value <- peek(dut.out)
} yield value.litValue == 100
```

Now our `program` looks a lot like a sequence of imperative statements - *but* it actually is just a description of a simulation program - it is a *value* which can be interpreted and executed by the SimCommand runtime.

### Command Combinators

Expressing simulation programs as *values* means we can write functions that take *programs* as arguments and return new *programs*.
The SimCommand library comes with a set of combinators that makes it easy to compose larger testbench programs from smaller programs.

```scala
// Return a program that repeats a command n times
def repeat(cmd: Command[_], n: Int): Command[Unit]
val tenInteractions: Command[Unit] = repeat(program, n=10)

// Take a list of programs and return a new program that executes each one back-to-back
def concat[R](cmds: Seq[Command[R]]): Command[Unit]
val fiftyInteractions: Command[Unit] = concat(Seq.fill(50)(program))

// Take a list of programs and return a new program that executes each one AND aggregates their results
def sequence[R](cmds: Seq[Command[R]]): Command[Seq[R]]
val hundredInteractions: Command[Seq[Boolean]] = sequence(Seq.fill(100)(program))
```

Note that implementing looping constructs using this functional style needs to be done using recursion.
However, `flatMap` isn't stack safe, so nested recursive calls to `flatMap` will eventually blow the stack.
We have provided a set of stack safe looping constructs:

```scala
// Run this program until it returns false
def doWhile(cmd: Command[Boolean]): Command[Unit]

val program: Command[Boolean] = for {
value <- peek(dut.out)
_ <- step(1)
} yield value == 0
doWhile(program)

// Run this program continuously
def forever(cmd: Command[_]): Command[Nothing]
forever(program)
```

The SimCommand library also has a set of small programs that you can use to build larger ones:
```scala
// Step the clock until the signal == value
def waitForValue[I <: Data](signal: I, value: I): Command[Unit]

val program = for {
_ <- waitForValue(dut.out, 100.U)
_ <- poke(dut.in, 101.U)
} yield ()
```

### Multithreading

A `Command` can be forked off to begin its execution on a different simulation thread using `fork`.
This gives you a thread handle that can be used to block on that thread's return value using `join`.

Let's create one thread to drive the DUT's input while another thread observes the DUT's output.

```scala
def driver(nElems: Int, start: Int): Command[Unit] = {
def driveOne(value: Int): Command[Unit] = for {
_ <- poke(dut.in, i.U)
_ <- step(1)
} yield ()

concat((start until start + nElems).map { i => driveOne(i) })
}

def receiver(nElems: Int): Command[Seq[Int]] = {
val receiveOne(): Command[Int] = for {
_ <- step(1)
value <- peek(dut.out)
} yield value.litValue.toInt

concat(Seq.fill(nElems)(receiveOne()))
}
```

There are 2 user functions available:
- `fork[R](cmd: Command[R], name: String): Command[ThreadHandle[R]]`
- Calling `fork` gives you a `Command` that returns a `ThreadHandle` which you can use when joining
- `join[R](handle: ThreadHandle[R]): Command[R]`
- Calling `join` with a `ThreadHandle` describes blocking the current thread until the joining thread has termianted with a value of type `R`

```scala
val program: Command[Boolean] = for {
drvThread: ThreadHandle[Unit] <- fork(driver(100, 0), "driver")
recvThread: ThreadHandle[Seq[Int]] <- fork(receiver(100), "receiver")
_ <- join(drvThread)
vals <- join(recvThread)
} yield vals == Seq.tabulate(100)(i => i)
```
This is a `Command` that describes running these `driver` and `receiver` in parallel, waiting on both of them to complete, extracting the terminating return value from the receiver, and checking the results.

Note that you can pass `ThreadHandle`s outside the `Command` that `fork`ed the thread, but each thread can only be `join`ed once.

### Debugging and Errors

WIP

### Runtime / Scheduler

You can use `Command` and its combinators to build up a description of a complete simulation program.
Then, at the very end of your test function, you can call `unsafeRun` to execute the `Command` and get back a `Result` object, which contains the return value and some metadata about the simulation.
**You should use** the `NoThreadingAnnotation` on `chiseltest`'s test function to achieve the best performance.

```scala
import chiseltest.internal.NoThreadingAnnotation
test(new Register()).withAnnotations(Seq(NoThreadingAnnotation)) { dut =>
val program: Command[Boolean]
val result = unsafeRun(program, dut.clock, print=false)
println(result.retval) // retval: Boolean
println(result.cycles)
println(result.threadsSpawned)
}
```

#### Algorithm

The `unsafeRun` function invokes the SimCommand interpreter which performs the following event loop:

- while(true)
- Iterate through the list of running threads, for each thread `t` do the following until it has hit a synchronization point (`step`, `join`, or `return`) for this timestep
- As `t` is being run, proxy any calls to `peek` or `poke` to the chiseltest API
- If `t` hits a `fork`, add the forked thread to the list of running threads and continue running `t`
- If `t` hits a `step`, overwrite the `t` pointer with the `Command` that comes after the `step` and record the time at which `t` will be woken up
- If `t` hits a `join`, mark `t` as asleep until the thread being joined has returned
- If `t` hits a `return`, add `t`'s return value to a global map and mark any threads which have requested a join on `t` to be awakened on this timestep
- If the main thread has returned with `retval`, *exit the loop* with `retval`
- Step the clock in the simulator

Note that the interpreter doesn't prevent intra-cycle race conditions when two threads are peeking and poking the same DUT IO.
Support is being considered for catching these issues at runtime, like the threaded chiseltest backend does.

### VIPs

There is a small library of verification IPs built using SimCommand.
They are just functions that return a `Command` that describes an interaction for a given bus specification.

#### UART

```scala
// Create an instance of UARTCommands with bindings to your DUT's UART IOs and specify the baud rate
val uartCmds = new UARTCommands(dut.uartTx, dut.uartRx, cyclesPerBit)

// Then call functions from the instance to produce Command's you can use in your testbench
uartCmds.sendByte(24) // : Command[Unit]
uartCmds.sendBytes(Seq(4, 23, 12)) // : Command[Unit]
uartCmds.receiveByte() // : Command[Int]
uartCmds.receiveBytes(n: Int) // : Command[Seq[Int]]
```

#### Ready/Valid (Decoupled)

```scala
val dut = Queue(UInt(32.W))
val enqCmds = new DecoupledCommands(dut.enq)
enqCmds.enqueue(100.U) // : Command[Unit]
enqCmds.enqueueSeq(Seq(1.U, 2.U, 3.U)) // : Command[Unit]

val deqCmds = new DecoupledCommands(dut.deq)
deqCmds.dequeue() // : Command[UInt]
deqCmds.dequeueN(n: Int) // : Command[Seq[UInt]]
```

#### TileLink

WIP

## Testbenches

This repo contains an implementation and benchmarks of a simulation command monad in Scala that is used with [chiseltest](https://github.com/ucb-bar/chiseltest) to describe and execute multithreaded RTL simulations with minimal threading overhead.

- DecoupledGCD: artificial example with 2 ready-valid interfaces
- `gcd.DecoupledGcdChiseltestTester` (chiseltest with standard threading, chiseltest with manually interleaved threads, chiseltest with Command API, chiseltest with manually interleaved threads + raw simulator API) (verilator + treadle)
- `cocotb/gcd_tb` (cocotb)
- NeuromorphicProcessor: system-level testbench with top-level UART interaction
- `neuroproc.systemtests.NeuromorphicProcessorChiseltestTester` (chiseltest with standard threading API)
- `neuroproc.systemtests.NeuromorphicProcessorManualThreadTester` (chiseltest with single threaded backend, manually interleaved threading, and Chisel API)
- `neuroproc.systemtests.NeuromorphicProcessorRawSimulatorTester` (chiseltest with single threaded backend, manually interleaved threading, and raw FIRRTL API)
- `neuroproc.systemtests.NeuromorphicProcessorCommandTester` (chiseltest with single threaded backend, and Command interpreter providing threading support)
- `cocotb/testbench` (cocotb)

### Benchmark Results

| Simulation API | DecoupledGCD | NeuromorphicProcessor |
|----------------------------------------------------|-------------------|-----------------------|
| cocotb | 3.8 kHz, 43.2 sec | 9.9 kHz, 89:38 min |
| Chiseltest with threading | 7.8 kHz, 21 sec | 32.6 kHz, 27:21 min |
| Command API | 67 kHz, 2.4 sec | 165 kHz, 5:23 min |
| Chiseltest with manual threading | 218 kHz, 0.75 sec | 432 kHz, 2:03 min |
| Chiseltest with manual threading + raw FIRRTL API} | | 453 kHz, 1:58 min |

### Caveats for cocotb NeuromorphicProcessor Testbench

- Use `timescale 1ps/1ps` at the top of `NeuromorphicProcessor.sv` to match chiseltest
- Use a 2ps period clock to match chiseltest
- For iverilog
- Remove `@(*)` after `always_latch` in `ClockBufferBB.sv` (event sensitivity lists are automatically inferred)
- Add iverilog vcd dumping to `NeuromorphicProcessor.sv` if needed
```verilog
`ifdef COCOTB_SIM
initial begin
$dumpfile ("NeuromorphicProcessor.vcd");
$dumpvars (0, NeuromorphicProcessor);
#1;
end
`endif
```
29 changes: 29 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
ThisBuild / scalaVersion := "2.13.8"
ThisBuild / version := "0.1.0"
ThisBuild / organization := "edu.berkeley.cs"

val chiselVersion = "3.5.4"

lazy val root = (project in file("."))
.settings(
name := "simcommand",
libraryDependencies ++= Seq(
"edu.berkeley.cs" %% "chisel3" % chiselVersion,
"edu.berkeley.cs" %% "chiseltest" % "0.5.4"
),
scalacOptions ++= Seq(
"-language:reflectiveCalls",
"-deprecation",
"-feature",
"-Xcheckinit",
),
addCompilerPlugin("edu.berkeley.cs" % "chisel3-plugin" % chiselVersion cross CrossVersion.full),
Compile / scalaSource := baseDirectory.value / "src",
Compile / resourceDirectory := baseDirectory.value / "src" / "resources",
Test / scalaSource := baseDirectory.value / "test",
Test / resourceDirectory := baseDirectory.value / "test" / "resources",
assembly / mainClass := Some("simapi.ProfilingExample"),
)

fork in run := true
mainClass in Compile := Some("simapi.ProfilingExample")
1 change: 1 addition & 0 deletions mapping
Loading

0 comments on commit b18427d

Please sign in to comment.