Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAD-Sim Co-Simulation Documentation #23

Merged
merged 3 commits into from
Dec 1, 2023
Merged

RAD-Sim Co-Simulation Documentation #23

merged 3 commits into from
Dec 1, 2023

Conversation

geotrieu
Copy link
Collaborator

No description provided.


Design Architecture
-------------------
This design consists of five top-level modules: The Matrix Vector Multiplication (MVM), the Dispatcher, the Weight Loader,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"five top-level modules" can be misunderstood because technically all these modules are instantiated in the top-level of the example design. Maybe change to "five different modules".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Was thinking of it more like RTL where the top-level modules didn't include the testbench/top-file.
Changed

This design consists of five top-level modules: The Matrix Vector Multiplication (MVM), the Dispatcher, the Weight Loader,
the Instruction Loader, and the Collector module.
There is a compiler included to generate sample test cases that can be loaded into the design.
The MVM module is implemented in both SystemC and RTL to illustrate co-simulation compatibility in RAD-Sim.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

co-simulation "capability"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

MVM
^^^^
The MVM module is responsible for performing matrix vector multiplication.
It uses a modular scalable design to perform MVM calculations in parallel.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit awkward (MVM uses modular design to perform MVM). I think we can remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed and edited

:width: 1000
:alt: MVM Workflow

A fixed number of parallel Dot-Product Engines (DPEs) are used to calculate up to L by D elements in each timestep,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to calculate "the multiplication of an L-element vector and a DxL matrix" in each timestep

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

A fixed number of parallel Dot-Product Engines (DPEs) are used to calculate up to L by D elements in each timestep,
where L is the number of lanes (number of elements each DPE can handle at once), and D is the number of DPEs in the MVM.
In the case the size of the vector exceeds L, subsequent elements in the vector are queued in a FIFO in sets of L and processed sequentially.
This design also supports multiple MVM modules to parallelize this process.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace lines 42-48 with:
"A matrix can be horizontally split into multiple interleaved blocks where different blocks are mapped to different MVM engines. The MVMs working on the same matrix-vector multiplication can accumulate results of multiple horizontal blocks and then reduce their partial results to compute the final output. Within each block, the MVM processes tiles of D rows sequentially over multiple steps."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


Weight Loader
^^^^^^^^^^^^^^
Weights are stored in memory in each MVM module, and are only required to be loaded once.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored in "register files" ... to be loaded once "before the start of execution".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

^^^^^^^^^^^^^^
Weights are stored in memory in each MVM module, and are only required to be loaded once.
Weight matrices are generated by the compiler. The weight matrix for each DPE in all MVMs is independent.
Each weight matrix is loaded sequentially via AXI-S.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weight loader module sequentially sends weight matrices as vectors of size L elements to the corresponding MVMs over the NoC.

Copy link
Collaborator Author

@geotrieu geotrieu Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Instruction Loader
^^^^^^^^^^^^^^^^^^^
Instructions for each MVM are loaded once and are infinitely looped automatically (does not require a jump).
Instructions are generated by the compiler for each MVM and sent to the corresponding MVM via AXI-S.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corresponding MVM "over the NoC"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

0x3 Write Weights
=== =======================

**RF_ADDR (9 bits)**: Weight RF Address to Write to
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write should be lower case (write)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

$ cd <rad_flow_root_dir>/rad-sim
$ python config.py mlp_int8

Next, a test case is generated using the built-in Python compiler. Ensure the radflow conda environment is activated.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the Python compiler. For this step, make sure that the radflow conda ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@andrewboutros andrewboutros merged commit c2a9605 into main Dec 1, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants