Skip to content

Commit

Permalink
Updated Fugaku (a64fx) performance results.
Browse files Browse the repository at this point in the history
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
  entry within Performance.md, and also updated the experiment details
  accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
  experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
  under which the Fugaku results were gathered, courtesy of RuQing Xu.
  • Loading branch information
fgvanzee committed May 25, 2021
1 parent e5c85da commit 82af05f
Show file tree
Hide file tree
Showing 10 changed files with 16 additions and 22 deletions.
32 changes: 13 additions & 19 deletions docs/Performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@ The `runthese.m` file will contain example invocations of the function.
### A64fx experiment details

* Location: RIKEN Center of Computational Science in Kobe, Japan
* These test results were gathered on the Fugaku supercomputer under project "量子物質の創発と機能のための基礎科学 ―「富岳」と最先端実験の密連携による革新的強相関電子科学" (hp200132)
* These test results were gathered on the Fugaku supercomputer under project "量子物質の創発と機能のための基礎科学 ―「富岳」と最先端実験の密連携による革新的強相関電子科学" (hp200132) (Basic Science for Emergence and Functionality in Quantum Matter: Innovative Strongly-Correlated Electron Science by Integration of "Fugaku" and Frontier Experiments)
* Processor model: Fujitsu A64fx
* Core topology: one socket, 4 NUMA groups per socket, 13 cores per group (one reserved for the OS), 48 cores total
* SMT status: Unknown
Expand All @@ -546,23 +546,17 @@ The `runthese.m` file will contain example invocations of the function.
* multicore: 70.4 GFLOPS/core (double-precision), 140.8 GFLOPS/core (single-precision)
* Operating system: RHEL 8.3
* Page size: 256 bytes
* Compiler: gcc 9.3.0
* Results gathered: 2 April 2021
* Compiler: gcc 10.1.0
* Results gathered: 2 April 2021; BLIS and SSL2 updated on 20 May 2021
* Implementations tested:
* BLIS 757cb1c (post-0.8.1)
* configured with `./configure -t openmp --sve-vector-size=vla CFLAGS="-D_A64FX -DPREFETCH256 -DSVE_NO_NAT_COMPLEX_KERNELS" arm64_sve` (single- and multithreaded)
* sub-configuration exercised: `arm64_sve`
* Single-threaded (1 core) execution requested via:
* `export BLIS_SVE_KC_D=2048 BLIS_SVE_MC_D=128 BLIS_SVE_NC_D=26880 BLIS_SVE_KERNEL_IDX_D=14` (double precision)
* `export BLIS_SVE_KC_S=2048 BLIS_SVE_MC_S=256 BLIS_SVE_NC_S=23040 BLIS_SVE_KERNEL_IDX_S=2` (single precision)
* Multithreaded (12 core) execution requested via:
* `export BLIS_JC_NT=1 BLIS_IC_NT=2 BLIS_JR_NT=6`
* `export BLIS_SVE_KC_D=2400 BLIS_SVE_MC_D=64 BLIS_SVE_NC_D=26880 BLIS_SVE_KERNEL_IDX_D=14` (double precision)
* `export BLIS_SVE_KC_S=2400 BLIS_SVE_MC_S=128 BLIS_SVE_NC_S=23040 BLIS_SVE_KERNEL_IDX_S=2` (single precision)
* Multithreaded (48 core) execution requested via:
* `export BLIS_JC_NT=1 BLIS_IC_NT=4 BLIS_JR_NT=12`
* `export BLIS_SVE_KC_D=2048 BLIS_SVE_MC_D=128 BLIS_SVE_NC_D=26880 BLIS_SVE_KERNEL_IDX_D=14` (double precision)
* `export BLIS_SVE_KC_S=2048 BLIS_SVE_MC_S=256 BLIS_SVE_NC_S=23040 BLIS_SVE_KERNEL_IDX_S=2` (single precision)
* BLIS 61584de (post-0.8.1)
* configured with:
* `../configure -t none CFLAGS="-DCACHE_SECTOR_SIZE_READONLY" a64fx` (single-threaded)
* `../configure -t openmp CFLAGS="-DCACHE_SECTOR_SIZE_READONLY" a64fx` (multithreaded)
* sub-configuration exercised: `a64fx`
* Single-threaded (1 core) execution requested via no change in environment variables
* Multithreaded (12 core) execution requested via `export BLIS_JC_NT=1 BLIS_IC_NT=1 BLIS_JR_NT=12`
* Multithreaded (48 core) execution requested via `export BLIS_JC_NT=1 BLIS_IC_NT=4 BLIS_JR_NT=12`
* Eigen 3.3.9
* Obtained via the [Eigen GitLab homepage](https://gitlab.com/libeigen/eigen)
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
Expand Down Expand Up @@ -593,15 +587,15 @@ The `runthese.m` file will contain example invocations of the function.
#### pdf

* [A64fx single-threaded](graphs/large/l3_perf_a64fx_nt1.pdf)
* [A64fx multithreaded (12 cores)](graphs/large/l3_perf_a64fx_jc1ic2jr6_nt12.pdf)
* [A64fx multithreaded (12 cores)](graphs/large/l3_perf_a64fx_jc1ic1jr12_nt12.pdf)
* [A64fx multithreaded (48 cores)](graphs/large/l3_perf_a64fx_jc1ic4jr12_nt48.pdf)

#### png (inline)

* **A64fx single-threaded**
![single-threaded](graphs/large/l3_perf_a64fx_nt1.png)
* **A64fx multithreaded (12 cores)**
![multithreaded (12 cores)](graphs/large/l3_perf_a64fx_jc1ic2jr6_nt12.png)
![multithreaded (12 cores)](graphs/large/l3_perf_a64fx_jc1ic1jr12_nt12.png)
* **A64fx multithreaded (48 cores)**
![multithreaded (48 cores)](graphs/large/l3_perf_a64fx_jc1ic4jr12_nt48.png)

Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/graphs/large/l3_perf_a64fx_jc1ic2jr6_nt12.pdf
Binary file not shown.
Binary file removed docs/graphs/large/l3_perf_a64fx_jc1ic2jr6_nt12.png
Binary file not shown.
Binary file modified docs/graphs/large/l3_perf_a64fx_jc1ic4jr12_nt48.pdf
Binary file not shown.
Binary file modified docs/graphs/large/l3_perf_a64fx_jc1ic4jr12_nt48.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/graphs/large/l3_perf_a64fx_nt1.pdf
Binary file not shown.
Binary file modified docs/graphs/large/l3_perf_a64fx_nt1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions test/3/octave/runthese.m
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,6 @@
plot_panel_4x5(2.60,16,128,'2s','../results/zen2/20200929/jc8ic4jr4','zen2','MKL'); close all; clear all;

% a64fx
plot_panel_4x5(2.20,32,1, 'st','../results/a64fx/20210405/st', 'a64fx','Fujitsu SSL2'); close all; clear all;
plot_panel_4x5(2.20,32,12,'1s','../results/a64fx/20210405/jc1ic4jr3', 'a64fx','Fujitsu SSL2'); close all; clear all;
plot_panel_4x5(2.20,32,48,'2s','../results/a64fx/20210405/jc1ic4jr12','a64fx','Fujitsu SSL2'); close all; clear all;
plot_panel_4x5(2.20,32,1, 'st','../results/a64fx/20210520/st', 'a64fx','Fujitsu SSL2'); close all; clear all;
plot_panel_4x5(2.20,32,12,'1s','../results/a64fx/20210520/jc1ic1jr12','a64fx','Fujitsu SSL2'); close all; clear all;
plot_panel_4x5(2.20,32,48,'2s','../results/a64fx/20210520/jc1ic4jr12','a64fx','Fujitsu SSL2'); close all; clear all;

0 comments on commit 82af05f

Please sign in to comment.