-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAD-Sim Co-Simulation Documentation #23
Conversation
|
||
Design Architecture | ||
------------------- | ||
This design consists of five top-level modules: The Matrix Vector Multiplication (MVM), the Dispatcher, the Weight Loader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"five top-level modules" can be misunderstood because technically all these modules are instantiated in the top-level of the example design. Maybe change to "five different modules".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Was thinking of it more like RTL where the top-level modules didn't include the testbench/top-file.
Changed
This design consists of five top-level modules: The Matrix Vector Multiplication (MVM), the Dispatcher, the Weight Loader, | ||
the Instruction Loader, and the Collector module. | ||
There is a compiler included to generate sample test cases that can be loaded into the design. | ||
The MVM module is implemented in both SystemC and RTL to illustrate co-simulation compatibility in RAD-Sim. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co-simulation "capability"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
MVM | ||
^^^^ | ||
The MVM module is responsible for performing matrix vector multiplication. | ||
It uses a modular scalable design to perform MVM calculations in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit awkward (MVM uses modular design to perform MVM). I think we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed and edited
:width: 1000 | ||
:alt: MVM Workflow | ||
|
||
A fixed number of parallel Dot-Product Engines (DPEs) are used to calculate up to L by D elements in each timestep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to calculate "the multiplication of an L-element vector and a DxL matrix" in each timestep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
A fixed number of parallel Dot-Product Engines (DPEs) are used to calculate up to L by D elements in each timestep, | ||
where L is the number of lanes (number of elements each DPE can handle at once), and D is the number of DPEs in the MVM. | ||
In the case the size of the vector exceeds L, subsequent elements in the vector are queued in a FIFO in sets of L and processed sequentially. | ||
This design also supports multiple MVM modules to parallelize this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace lines 42-48 with:
"A matrix can be horizontally split into multiple interleaved blocks where different blocks are mapped to different MVM engines. The MVMs working on the same matrix-vector multiplication can accumulate results of multiple horizontal blocks and then reduce their partial results to compute the final output. Within each block, the MVM processes tiles of D rows sequentially over multiple steps."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
|
||
Weight Loader | ||
^^^^^^^^^^^^^^ | ||
Weights are stored in memory in each MVM module, and are only required to be loaded once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stored in "register files" ... to be loaded once "before the start of execution".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
^^^^^^^^^^^^^^ | ||
Weights are stored in memory in each MVM module, and are only required to be loaded once. | ||
Weight matrices are generated by the compiler. The weight matrix for each DPE in all MVMs is independent. | ||
Each weight matrix is loaded sequentially via AXI-S. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weight loader module sequentially sends weight matrices as vectors of size L elements to the corresponding MVMs over the NoC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Instruction Loader | ||
^^^^^^^^^^^^^^^^^^^ | ||
Instructions for each MVM are loaded once and are infinitely looped automatically (does not require a jump). | ||
Instructions are generated by the compiler for each MVM and sent to the corresponding MVM via AXI-S. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corresponding MVM "over the NoC"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
0x3 Write Weights | ||
=== ======================= | ||
|
||
**RF_ADDR (9 bits)**: Weight RF Address to Write to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write should be lower case (write)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
$ cd <rad_flow_root_dir>/rad-sim | ||
$ python config.py mlp_int8 | ||
|
||
Next, a test case is generated using the built-in Python compiler. Ensure the radflow conda environment is activated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the Python compiler. For this step, make sure that the radflow
conda ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
No description provided.