This project is compatible with LLVM 10 and clang 10
Configs/
: The configuration files of PIMProf. The default isdefaultconfig_32.ini
.LLVMAnalysis/
: The tool for instrumenting the program. This is implemented as an LLVM pass and invoked by clang. This directory also contains some hooks that can be used for annotating region of interest.PIMProfSolver/
: The Pin tool for analyzing the instrumented program.test/
: The unit test.
Install llvm-10 and clang-10.
$ apt install clang-10 llvm-10`
Fill in the directory of your downloaded llvm-10 in CMakeLists.txt
:
set(LLVM_HOME "/usr/lib/llvm-10")
Then compile:
$ make -j
PIMProf solver now entirely depends on the runtime performance provided by simulators. As the proof of concept, we integrate our tool to Sniper in a separated repository:
https://github.com/Systems-ShiftLab/sniper_PIMProf
Clone the repository and checkout to the dev
branch to see all changes made by PIMProf.
$ git checkout dev
To compile Sniper, you might want to check the Sniper website (https://snipersim.org) and follow their instructions. You need to install a few prerequisite libraries, and download a recent version of Intel Pin tool before compiling Sniper. Note that the current version of Sniper only works with Pin <= 3.20.
We made minimal modifications to integrate PIMProf into Sniper. All the changes to the Sniper code base can be found by grep
ing "Yizhou" in the repository, and the same idea can be applied when integrating PIMProf to other simulators. We found it easiest to directly modify the include directory in common/system/simulator.h
.
#include "/home/warsier/Downloads/PIMProf/PIMProfSolver/Stats.h"
The sniper_PIMProf repository also comes with two testing suites: a unit test, and the GAP graph workload suites. They can be found in folder sniper_PIMProf/PIMProf
.
The unit test will provide a basic idea of how to use PIMProf to generate offloading decisions. The steps are listed as follows:
Let's take a look at Makefile
in the unit test:
To create an annotated version of test test.inj
, where the beginning and end of each of its basic block are marked, we invoke the LLVM pass libAnnotationInjection.so
. The expanded command will look like this:
export PIMPROFINJECTMODE=SNIPER && clang++-10 $(CXXFLAGS) $(SNIPER_CFLAGS) -Xclang -load -Xclang $(PIMPROF_ROOT)/build/LLVMAnalysis/libAnnotationInjection.so -o test.inj test.cpp -pthread
On compilation, we need to set the environment variable PIMPROFINJECTMODE
and then compile the program using LLVM pass libAnnotationInjection.so
.
There are two available PIMPROFINJECTMODE
s: SNIPER
, which will insert annotation at basic block level; and SNIPER2
, which will insert annotation at function level.
Note that PIMProf now does not require any modification to the source code. So any annotations in the source code of the unit test or the GAP workloads are deprecated.
Now take a look at run_inj.sh
in the unit test. This script can be directly used to generate offloading decisions for the unit test if SOLVER
is correctly pointing to the PIMProf solver located at build/PIMProfSolver/Solver.exe
.
We need two Sniper runs to generate the CPU performance and PIM performance separately. Using the following commands, the corresponding PIMProf results will be generated in folder inj_cpu
and inj_pim
:
export OMP_NUM_THREADS=1 && run-sniper --roi -n 1 -c pimprof_cpu -d inj_cpu -- ./test.inj
export OMP_NUM_THREADS=4 && run-sniper --roi -n 4 -c pimprof_pim -d inj_pim -- ./test.inj
As the last step, we feed the runtime profile to PIMProf solver Solver.exe
to generate offloading decisions. The usage of Solver is shown below:
Solver.exe <mode> -c <cpu_stats_file> -p <pim_stats_file> -r <reuse_file> -o <output_file>
Select mode from: mpki
, para
, reuse
.
In the result folder inj_cpu
and inj_pim
, there are two files of concern: pimprofstats.out
contains the runtime statistics of that run, and pimprofreuse.out
contains the data reuse information.
The example to generate the reuse
decision in run_inj.sh
looks like this:
Solver.exe reuse -c inj_cpu/pimprofstats.out -p inj_pim/pimprofstats.out -r inj_cpu/pimprofreuse.out -o reusedecision.out
where we use both pimprofstats.out
from inj_cpu
and inj_pim
as the CPU and PIM stats, and the pimprofreuse.out
only from the CPU run, because the reuse from the PIM run would be the same as CPU.
The generated decision is stored in reusedecision.out
.
GAP graph workloads (https://github.com/sbeamer/gapbs)
We have modified the Makefile
and provide a simple run_inj.sh
to demonstrate the idea of how to provide offloading decisions for GAP.
Note that the repository does not come with any pre-generated graphs. To generate graphs for testing purpose, please refer to the README from GAP.
Before compilation, you need to modify the CXX
, PIMPROF_ROOT
, PIMPROF_MODE
to the correct value.
Then you may run:
$ make inj
to generate the corresponding binary.
https://stackoverflow.com/questions/8486314/setting-processor-affinity-with-c-that-will-run-on-linux