Machine learning accelerators have been used extensively to compute models with high performance and low power. Unfortunately, the development pace of ML models is much faster than the accelerator design cycle, leading to frequent changes in the hardware architecture requirements, rendering many accelerators obsolete. Existing design tools and frameworks can provide quick accelerator prototyping, but only for a limited range of models that fit into a single hardware device. With the emergence of large language models such as GPT-3, there is an increased need for hardware prototyping of large models within a many-accelerator system to ensure the hardware can scale with ever-growing model sizes.
MASE provides an efficient and scalable approach for exploring accelerator systems to compute large ML models by directly mapping onto an efficient streaming accelerator system. Over a set of ML models, MASE can achieve better energy efficiency to GPUs when computing inference for recent transformer models.
- Fast Prototyping Next-Generation Accelerators for New ML Models using MASE: ML Accelerator System Exploration, link
@article{cheng2023fast, title={Fast prototyping next-generation accelerators for new ml models using mase: Ml accelerator system exploration}, author={Cheng, Jianyi and Zhang, Cheng and Yu, Zhewen and Montgomerie-Corcoran, Alex and Xiao, Can and Bouganis, Christos-Savvas and Zhao, Yiren}, journal={arXiv preprint arXiv:2307.15517}, year={2023}}
- MASE: An Efficient Representation for Software-Defined ML Hardware System Exploration, link
@article{zhangmase, title={MASE: An Efficient Representation for Software-Defined ML Hardware System Exploration}, author={Zhang, Cheng and Cheng, Jianyi and Yu, Zhewen and Zhao, Yiren}}
This repo contains the following directories:
components
- Internal hardware libraryscripts
- Installation scriptsmachop
- MASE's software stackhls
- HLS component of MASEmlir-air
- MLIR AIR for ACAP devicesdocs
- DocumentationDocker
- Docker container configurations
First, make sure the repo is up to date:
make sync
Start with the docker container by running the following command under the repo:
make shell
It may take long time to build the docker container for the first time. Once done, you should enter the docker container. To build the tool, run the following command:
cd /workspace
make build
This should also take long time to finish.
If you would like to contribute, please check the wiki for more information.
First, make sure the repo is up to date:
make sync
Install conda by following the instructions here, then build and activate the environment as follows.
conda env create -f machop/environment.yml
conda activate mase
Optionally, you can verify the CLI utility is running.
./machop/ch --version
In this example, we'll use the command-line interface (CLI) to train a toy model on the Jet Substructure (JSC) dataset, quantize it to integer arithmetic and evaluate the quantized model on the test split of the dataset.
First, train the toy model over 10 epochs by running the following command.
./machop/ch train jsc-tiny jsc --max-epochs 10 --batch-size 256
Now, transform the model to integer arithmetic and save the model checkpoint. The toml
configuration file specifies the required arguments for the quantization flow.
./machop/ch transform jsc-tiny jsc --config ./machop/configs/examples/jsc_toy_by_type.toml
Finally, evaluate the quantized model performance.
./machop/ch test jsc-tiny jsc --load <path/to/checkpoint>
See the Machop README for a more detailed introduction.
- Subscribe Mase Weekly Dev Meeting (Wednesday 4:30 UK time). Everyone is welcomed!
- Direct Google Meet link
- Join the Mase Slack
- If you want to discuss anything in future meetings, please add them as comments in the meeting agenda so we can review and add them.
If you think MASE is helpful, please donate for our work, we appreciate your support!