Bandwidth

Purpose

bandwidth is a benchmark intended to measure the memory bandwidth of single-core and multi-core CPUs. It is inspired by the famous stream benchmark from the University of Virginia.

Compared to stream, bandwidth is written in C++17 and it takes advantage of C++ meta-programming capabilities.

Main advantages of bandwidth over stream:

Ability to benchmark buffers of different memory sizes without re-compile the code
Benchmark different datatypes (single and double precision)
The code is explicitly vectorized (over SIMD intrinsic calls), thus giving a more realistic memory bandwidth peak

Known limitations:

The code works "only" for AltiVec, SSE, AVX, AVX-512, NEON or RVV 1.0 capable CPUs
- It represents most of the available SIMD CPUs of the market
- However, for now, SVE is not supported
The code only compiles with the C++ GNU Compiler (g++)

Features

bandwidth measures the memory bandwidth over 7 micro-benchmarks:

read: x = A[i]
write: A[i] = x
copy: B[i] = A[i]
incr: A[i] = A[i] + 1
scale: B[i] = x * A[i]
add: C[i] = A[i] + B[i]
triad: C[i] = x * A[i] + B[i]

Installation and Execution

bandwidth depends on optparse for the CLI management:

git submodule update --init --recursive

cmake is used to generate the Makefile:

mkdir build
cd build
cmake .. -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native" -DENABLE_OMP=ON -DENABLE_F16=ON
make -j4

To compile for RVV 1.0 compatible architectures you need to use a compiler that supports fixed-lenght RVV (typically C++ GNU Compiler version >= 14). Then, at the compile time you will have to specify the hardware vlen. For instance, if your hardware has 256-bit SIMD registers, you can do (pay attention to the zvl256b flag):

cmake .. -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=rv64gv1_zve32f_zvl256b -mrvv-vector-bits=zvl" -DENABLE_OMP=ON -DENABLE_F16=OFF

Now you can run the code:

./bin/bandwidth

This will produce an output that looks like the following:

Testing bandwidth with type: float
  size:     41 KB  	read:   1.44 TB/s  	write:    978 GB/s  	copy:   1.89 TB/s  	scale:   1.89 TB/s  	add:   2.58 TB/s  	triad:   2.58 TB/s
  size:   61.4 KB  	read:   1.64 TB/s  	write:    956 GB/s  	copy:   1.64 TB/s  	scale:   1.64 TB/s  	add:   2.15 TB/s  	triad:   2.15 TB/s
  size:   81.9 KB  	read:   1.54 TB/s  	write:    979 GB/s  	copy:   1.54 TB/s  	scale:   1.54 TB/s  	add:   1.88 TB/s  	triad:   1.88 TB/s
  [...]
  size:    273 MB  	read:    135 GB/s  	write:    269 GB/s  	copy:    206 GB/s  	scale:    152 GB/s  	add:    146 GB/s  	triad:    165 GB/s
  size:    383 MB  	read:    131 GB/s  	write:    195 GB/s  	copy:    177 GB/s  	scale:    146 GB/s  	add:    172 GB/s  	triad:    206 GB/s
  size:    537 MB  	read:    125 GB/s  	write:    200 GB/s  	copy:    169 GB/s  	scale:    196 GB/s  	add:    150 GB/s  	triad:    181 GB/s
Testing bandwidth with type: double
  size:     41 KB  	read:   1.59 TB/s  	write:    964 GB/s  	copy:   1.59 TB/s  	scale:   1.59 TB/s  	add:   1.86 TB/s  	triad:   1.77 TB/s
  size:   61.4 KB  	read:   1.59 TB/s  	write:    937 GB/s  	copy:   1.59 TB/s  	scale:   1.59 TB/s  	add:    2.2 TB/s  	triad:   1.84 TB/s
  size:   81.9 KB  	read:   1.48 TB/s  	write:    918 GB/s  	copy:   1.48 TB/s  	scale:   1.32 TB/s  	add:   1.92 TB/s  	triad:   1.92 TB/s
  [...]
  size:    273 MB  	read:    132 GB/s  	write:    205 GB/s  	copy:    173 GB/s  	scale:    181 GB/s  	add:    161 GB/s  	triad:    173 GB/s
  size:    383 MB  	read:    138 GB/s  	write:    207 GB/s  	copy:    173 GB/s  	scale:    171 GB/s  	add:    155 GB/s  	triad:    157 GB/s
  size:    537 MB  	read:    141 GB/s  	write:    197 GB/s  	copy:    180 GB/s  	scale:    154 GB/s  	add:    152 GB/s  	triad:    152 GB/s

In this example, the code has been run on a Apple M1 Pro CPU. Of course, the bandwidth will be different on different architectures.

Note that by default the benchmark run on all the cores. If you want to bench single-core memory bandwidth you can do:

OMP_NUM_THREADS=1 ./bin/bandwidth

Plotting the Results

A Python3 script is given (plot.py). It allows to quickly plot the results of bandwidth. Here is an example of use (from the build dir):

OMP_NUM_THREADS=1 ./bin/bandwidth -M 2GiB -C > bandwidth_output.csv
../plot.py bandwidth_output.csv -o plot.png
../plot.py bandwidth_output.csv -o plot_with_caches.png --L1 131072 --L2 12582912

Contributors

This code has been developed at Sorbonne University from Paris in the LIP6 laboratory.

Here are the people involved in the project:

Florian LEMAITRE: main contributor
Adrien CASSAGNE: enthusiastic contributor
Lionel LACASSAGNE: development supervisor

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
img		img
include		include
optparse @ 841f52e		optparse @ 841f52e
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cron.sh		cron.sh
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bandwidth

Purpose

Features

Installation and Execution

Plotting the Results

Contributors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

alsoc/bandwidth

Folders and files

Latest commit

History

Repository files navigation

Bandwidth

Purpose

Features

Installation and Execution

Plotting the Results

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages