Skip to content

FZJ-JSC/CARAML

Repository files navigation

CARAML

Compact Automated Reproducible Assessment of Machine Learning (CARAML) is a benchmark framework designed to assess AI workloads on novel accelerators. It has been developed and tested extensively on systems at the Jülich Supercomputing Centre (JSC).

CARAML leverages JUBE, a scripting-based framework for creating benchmark sets, running them across different systems, and evaluating results. Additionally, it includes power/energy measurements through the jpwr tool.

Paper: Arxiv, IEEE

Tested Accelerators

CARAML has been tested on the JURECA-DC EVALUATION PLATFORM, JURECA-DC, JEDI, WEST-AI Nodes and NHR-FAU. These include the accelerators:

| System                                            | Configuration                                     | Tag       |
|---------------------------------------------------|---------------------------------------------------|-----------|
| NVIDIA Ampere node (SXM)                          | 4 × A100 (40GB HBM2e) GPUs                        | `A100`    |
| NVIDIA Hopper node (PCIe)                         | 4 × H100 (80GB HBM2e) GPUs                        | `H100`    |
| NVIDIA Hopper node (NVLink)                       | 4 × H100 (94GB HBM2e) GPUs                        | `WAIH100` |
| NVIDIA Grace-Hopper chip                          | 1 × GH200 (480GB LPDDR5X, 96GB HBM3) GPU          | `GH200`   |
| NVIDIA Grace-Hopper node                          | 4 × GH200 (120GB LPDDR5X, 96GB HBM3) GPUs         | `JUPITER` |
| AMD MI300X node                                   | 8 × MI300X (192GB HBM3) GPUs                      | `MI300X`  |
| AMD MI300A node                                   | 4 × MI300A (128GB HBM3) APUs                      | `MI300A`  |
| AMD MI200 node                                    | 4 × MI250 (128GB HBM2e) GPUs                      | `MI250`   |
| Graphcore IPU-POD4 M2000                          | 4 × GC200 (512GB DDR4-3200) IPUs                  | `GC200`   |

Benchmark

CARAML currently provides benchmarks implemented in Python:

1. Computer Vision: Image Classification (Training)

The image_classification model training benchmark is implemented in PyTorch. It is designed to test image classification models such as ResNet50 on various accelerators. For IPU's graphcore/examples is used.

Performance is measured in images/s and energy is measured in Wh.

Note: Support for the image classification benchmark in TensorFlow has been discontinued.

2. Natural Language Processing: GPT Language Model (Training)

The LLM-training benchmark is implemented in PyTorch with:

Performance is measured in tokens/s and energy is recorded in Wh.

Requirements

To run the benchmarks, install JUBE following JUBE Installation Documentation setup instructions. The benchmarks are deployed using Apptainer containers and executed using SLURM on the tested accelerators.

Dataset

  • Image Classification: Synthetic data is generated on the host machine for benchmarking. The IPU tag synthetic additionally allows for the generation of synthetic data directly on the IPU.

  • LLM Training: A subset of the OSCAR dataset (790 samples, ~10 MB) is pre-processed using GPT-2 tokenizers. This data is provided in the llm_data directory.

Execution

  • Clone the repository and navigate into it:
git clone https://github.com/FZJ-JSC/CARAML.git
cd CARAML
  • Modify the system and model parameters in the respective JUBE configuration file.
  • To pull the required container use the container tag as follows:
    jube run  {JUBEConfig}.{xml,yaml} --tag container H100
    Replace H100 with one of the following as needed:
    • GH200 (for Arm CPU + H100)
    • MI250 or MI300X or MI300A (for AMD)
    • GC200 (for Graphcore)

Note: The container tag should ideally be used only once at the beginning to pull and set up the container.

Image Classification (Training)

  • To run the benchmark with defined configurations do

    jube run image_classification/image_classification_torch_benchmark.xml --tag H100

    H100 can be replaced with any tag mentioned in tested accelerators section.

  • After the benchmark has been executed, use jube continue to postprocess results

    jube continue image_classification/image_classification_torch_benchmark_run -i last
  • To generate result do:

    jube result image_classification/image_classification_torch_benchmark_run -i last

LLM Training

  • To run the benchmark with defined configurations for 800M GPT model with OSCAR data do:

    jube run llm_training/llm_benchmark_nvidia_amd.yaml --tag 800M A100

    A100 can be replaced with any tag mentioned in tested accelerators section and 800M can be replaced with 13B and 175B for systems with more node resources.

  • To run the benchmark with defined configurations for 117M GPT model on Graphcore with synthetic data do

    jube run llm_training/llm_benchmark_ipu.yaml --tag 117M synthetic

    If tag synthetic is not given, the benchmark will use OSCAR data.

  • After the benchmark has been executed, use jube continue to postprocess results

    jube continue llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last
  • To generate result do:

    jube result llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last

Results

Image Classsification: ResNet50 LLM Training Benchmark

JSC Specific Fixes

In order to use PyTorch torch run API on JSC systems fixed_torch_run.py fix is required. The fix solves the issue defined here.

Additionally the hostname is appended with an i for allowing communication over InfiniBand as described here.

Citation

@INPROCEEDINGS{10820809,
  author={John, Chelsea Maria and Nassyr, Stepan and Penke, Carolin and Herten, Andreas},
  booktitle={SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, 
  title={Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML}, 
  year={2024},
  pages={1164-1176},
  doi={10.1109/SCW63240.2024.00158}
  }

About

CARAML Benchmark Suite

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •