Skip to content

KULeuven-MICAS/carbonspot

Repository files navigation

CarbonSpot

This repository presents the CarbonSpot, a analytical design space exploration (DSE) framework evaluating the performance and the carbon footprint for AI accelerators. CarbonSpot bridges the gap between the performance-oriented DSE framework and the carbon-oriented DSE framework, enabling the architecture comparison between the most performant hardware choice and the most carbon-friendly hardware choice within one framework.

The part of the performance simulator is inherited from our previous framework ZigZag-IMC and ZigZag, with analytical accelerators performance models that have been validated against chip results.

If you find this repository useful to your work, please consider cite our paper in your study:

@ARTICLE{11142328,
  author={Sun, Jiacong and Yi, Xiaoling and Symons, Arne and Gielen, Georges and Eeckhout, Lieven and Verhelst, Marian},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems}, 
  title={An Analytical Model for Performance-Carbon Co-Optimization of Edge AI Accelerators}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  keywords={Costs;Carbon;Carbon dioxide;Hardware;Computational modeling;AI accelerators;Fabrication;Estimation;Electricity;Analytical models;AI accelerators;carbon cost estimation;performance modeling;edge computing;in-memory computing},
  doi={10.1109/TCAD.2025.3602746}}

Motivation

The size of current AI models is increasing drastically in recent years together with model accuracy and capability. We plotted this trend in the figure below (data is collected from here). At the same time, this demands powerful hardware and significant energy consumption, bringing our community attention to evaluate the environmental impacts -- or more specific, the equivalent carbon footprint.

The trend of increasing model size and accuracy

Framework Capability

CarbonSpot is capabile of:

  • Figuring out the optimal mapping for a user-defined accelerator architectures, including digital TPU-like architecture, analog In-Memory-Computing architecture and digital In-Memory-Computing architecture.
  • Reporting the energy, latency and carbon footprint for a given architecture and workload.
  • Comparing the optimal architectures in terms of performance and in terms of carbon footprint, with respective optimal mapping.
  • Evaluating the carbon footprint under Continuous-Active (CA) scenario and Periodically-Active (PA) scenario.
  • Reporting the cost breakdown for energy, latency and carbon footprint.

MLPerf-Tiny and MLPerf-Mobile wokloads are placed under folder ./zigzag/inputs/examples/workload/. The timing requirement for each workload are summarized below.

If you find this table useful to your work, please consider cite our paper in your study!

Workload Suite Network Name Usecase Targetd Dataset Workload Size (MB) Frame/Second Requirement Paper Reference
MLPerf-Tiny DS-CNN Keyword Spotting Speech Commands 0.06 10 [1]
MLPerf-Tiny MobileNet-V1 Visual Wake Words Visual Wake Words Dataset 0.9 0.75 [2]
MLPerf-Tiny ResNet8 Binary Image Classification Cifar10 0.3 25 [3], [4]
MLPerf-Tiny AutoEncoder Anamaly Detection ToyADMOS 1.0 1 [5]
MLPerf-Mobile MobileNetV3 Image Classification ImageNet 15.6 25* [6]
MLPerf-Mobile SSD MobileNetV2 Object Classification COCO 64.4 25* [7]
MLPerf-Mobile DeepLab MobileNetV2 Semantic Segmentation ImageNet ADE20K Training Set 8.7 25* [8]
MLPerf-Mobile MobileBert Language Understanding NA 96 25* [9]

*: 25 FPS is borrowed from the setting for ResNet8 as this information is missing in the paper reference.

Getting Started

To get started, you can install all packages directly through pip using the pip-requirements.txt with the command:

$ pip install -r requirements.txt

The main script is expr.py, which can:

Evaluate the carbon footprint of prior works in literature

The function experiment_1_literature_trend() can output the equivalent carbon footprint of chips reported in prior works. Following graphs can be generated.

Carbon cost (y axis) versus energy efficiency (x axis) of AI accelerators from the literature in 16-28 nm CMOS technology when applied on the (a) MLPerf-Tiny and (b) MLPerf-Mobile benchmarks. The pie charts show the carbon breakdown into operational (green) and embodied (red) carbon costs for each design. TOP/s/W has been normalized to INT8. The most performant and most carbon-efficient designs are circled out.

Figure 1: Carbon cost (y axis) versus energy efficiency (x axis) of AI accelerators from the literature in 16-28 nm CMOS technology when applied on the (a) MLPerf-Tiny and (b) MLPerf-Mobile benchmarks. The pie charts show the carbon breakdown into operational (green) and embodied (red) carbon costs for each design. TOP/s/W has been normalized to INT8. The most performant and most carbon-efficient designs are circled out.

Follwing papers are used to generate the data points in the figure. Due to the page limitation, please forgive us not able to include the citations in the paper. We here sincerely want to express our gratitude to these amazing works.

[1] Chih, Yu-Der, et al. "16.4 An 89TOPS/W and 16.3 TOPS/mm 2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications." 2021 IEEE International Solid-State Circuits Conference (ISSCC). Vol. 64. IEEE, 2021.

[2] Tu, Fengbin, et al. "A 28nm 15.59 µJ/token full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modes." 2022 IEEE International Solid-State Circuits Conference (ISSCC). Vol. 65. IEEE, 2022.

[3] Lee, Chia-Fu, et al. "A 12nm 121-TOPS/W 41.6-TOPS/mm2 all digital full precision SRAM-based compute-in-memory with configurable bit-width for AI edge applications." 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). IEEE, 2022.

[4] Wang, Dewei, et al. "DIMC: 2219TOPS/W 2569F2/b digital in-memory computing macro in 28nm based on approximate arithmetic hardware." 2022 IEEE international solid-state circuits conference (ISSCC). Vol. 65. IEEE, 2022.

[5] Jiang, Weijie, et al. "A 16nm 128kB high-density fully digital In Memory Compute macro with reverse SRAM pre-charge achieving 0.36 TOPs/mm 2, 256kB/mm 2 and 23. 8TOPs/W." ESSCIRC 2023-IEEE 49th European Solid State Circuits Conference (ESSCIRC). IEEE, 2023.

[6] Jia, Hongyang, et al. "15.1 a programmable neural-network inference accelerator based on scalable in-memory computing." 2021 IEEE International Solid-State Circuits Conference (ISSCC). Vol. 64. IEEE, 2021.

[7] Liu, Shiwei, et al. "16.2 A 28nm 53.8 TOPS/W 8b sparse transformer accelerator with in-memory butterfly zero skipper for unstructured-pruned NN and CIM-based local-attention-reusable engine." 2023 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2023.

[8] Zhao, Yuanzhe, et al. "A double-mode sparse compute-in-memory macro with reconfigurable single and dual layer computation." 2023 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 2023.

[9] Lin, Chuan-Tung, et al. "iMCU: A 102-μJ, 61-ms Digital In-Memory Computing-based Microcontroller Unit for Edge TinyML." 2023 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 2023.

[10] Lee, Jinseok, et al. "Fully row/column-parallel in-memory computing SRAM macro employing capacitor-based mixed-signal computation with 5-b inputs." 2021 Symposium on VLSI Circuits. IEEE, 2021.

[11] Yin, Shihui, et al. "PIMCA: A 3.4-Mb programmable in-memory computing accelerator in 28nm for on-chip DNN inference." 2021 Symposium on VLSI Technology. IEEE, 2021.

[12] Zhu, Haozhe, et al. "COMB-MCM: Computing-on-memory-boundary NN processor with bipolar bitwise sparsity optimization for scalable multi-chiplet-module edge machine learning." 2022 IEEE International Solid-State Circuits Conference (ISSCC). Vol. 65. IEEE, 2022.

[13] Song, Jiahao, et al. "A 28 nm 16 kb bit-scalable charge-domain transpose 6T SRAM in-memory computing macro." IEEE Transactions on Circuits and Systems I: Regular Papers 70.5 (2023): 1835-1845.

Simulate and estimate the performance and carbon footprint of user-defined accelerators

The function zigzag_similation_and_result_storage() simulates and evalutes the performance and carbon footprint for given architectures. An example demonstration is shown below.

An example demonstration of CarbonSpot output

Figure 2: An example demonstration of CarbonSpot output. The figures show the carbon footprint under PA scenario (y axis) and CA scenario (x axis) across different architecture solutions. Different colors mean different SRAM size. (enable active_plot=True to see which point corresponds to what architecture solution.)

Performance Model Description

Note in the CarbonSpot paper, we only show the simulation results on digital TPU-like architecture and digital In-Memory-Computing architectures. In fact, the framework supports all propagation-based digital architectuers with any random data stationarity dataflow, and analog In-Memory-Computing architectures.

Our SRAM-based In-Memory-Computing performance model is borrowed from ZigZag-IMC, which supports both analog (AIMC) and digital (DIMC) In-Memory-Computing architectures. A summary of the hardware settings for these chips is provided in the following table.

source label Bi/Bo/Bcycle macro size #cell_group nb_of_macros
[1] AIMC1 7 / 2 / 7 1024×512 1 1
[2] AIMC2 8 / 8 / 2 16×12 32 1
[3] AIMC3 8 / 8 / 1 64×256 1 8
[4] DIMC1 8 / 8 / 2 32×6 1 64
[5] DIMC2 8 / 8 / 1 32×1 16 2
[6] DIMC3 8 / 8 / 2 128×8 8 8
[7] DIMC4 8 / 8 / 1 128×8 2 4

Bi/Bo/Bcycle: input precision/weight precision/number of bits processed per cycle per input. #cell_group: the number of cells sharing one entry to computation logic.

The validation details can be found at here.

Our digital performance model is based on ZigZag. The validation details can be found at here.

Carbon Model Description

The carbon model is developed based upon ACT. To estimate the carbon footprint of prior works, we adapt its equations and derive the following equations (used in expr.py):

For Continuous-Active (CA) scenario:

$Carbon/operation = \frac{k_1}{TOP/s/W} + \frac{k_2}{TOP/s/mm^2} + package\ cost$

For Periodic-Active (PA) scenario:

$Carbon/operation = \frac{k_1}{TOP/s/W} + \frac{k_2 \cdot T_{c} \cdot TOP/s}{TOP/s/mm^2 \cdot parallelism} + package\ cost$

where, $k_1$ is the operational carbon intensity ($\frac{301}{3.6E+18}\ g, CO_2/pJ$ in globe average). $k_2$ is the embodied carbon intensity ($8.709 \cdot \frac{1}{Yield \cdot lifetime(year)}\ g, CO_2/mm^2/ps$). $T_c$ is the reponse time constraint under PA scenario.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages