-
Notifications
You must be signed in to change notification settings - Fork 19
Building and Running Octotiger
Warning: If you're using jemalloc, there currently is a bug in master and version 5.0. Use 4.5 instead!
To build OctoTiger with all dependencies on a recent Ubuntu machine.
- Create a folder where you want to build OctoTiger with all dependencies
- Uncompress the scripts from this archive to the build folder
- Change the paths in
build_octotiger_with_dependencies.sh
so that they match your build folder - Execute
build_octotiger_with_dependencies.sh
Now missing packages are installed, all git repositories are cloned and Boost, Vc, HPX and OctoTiger are compiled.
In order to use the Vc library with OctoTiger, HPX must be configured using it. OctoTiger will pick up the appropriate settings from the HPX build system:
- Download, build, and install the Vc library as described here:
- Download, build, and install HPX as described here:
- http://stellar-group.github.io/hpx/docs/html/hpx/manual/build_system.html
-
Note: You will need to additionally enable the
datapar
functionality in HPX:cmake -DHPX_WITH_DATAPAR_VC=On -DVc_ROOT=<Vc_installation_dir> ...
- Build OctoTiger as usual, it should pick up the
datapar
functionality from HPX's build system
HPX, Vc and octotiger should be built with the script provided by @khuck that can be found in /project/projectdirs/xpress/hpx-lsu-cori-II
directory on NERSC resources. See the README there.
Warning: Due to a cori bug, you have to manually module unload darshan
in every new shell, putting the command in your .bashrc.ext
doesn't resolve this issue. If you forget to do this, your applications will link against darshan, which is not available on Cori Phase 2.
Instead of building octotiger with all dependencies, you can also use the built libraries that can be found in the knl-build
subfolder. However, using them might require some minor changes to the build scripts.
(from an email by @dmarce1):
For test scaling runs, I have created three startup files of the same initial problem at three different maximum levels of refinement. (The problem setup is a q=0.20 double white dwarf system in near equilibrium)
- 7 level run with 1513 subgrids
- 8 level run with 4297 subgrids
- 9 level run with 11241 subgrids.
(There is about a ~2.7 increase in workload with each step)
I have uploaded the restart files to google drive here (also saved in
/project/projectdirs/xpress/hpx-lsu-cori-II/restart-files
):https://drive.google.com/drive/folders/0B_Hf1bEwvJEkS21VMzBSdUg1S0k?usp=sharing
The command line to start from these files are are
./octotiger --hpx:threads 20 -Problem=dwd -Max_level=*L**E**V**E**L* − *X**s**c**a**l**e* = 4.0 − *E**o**s* = *w**d* − *A**n**g**c**o**n* = 1 − *R**e**s**t**a**r**t* = *r**e**s**t**a**r**t*LEVEL.chk -Stoptime=0.01where
$LEVEL
is 7, 8, or 9. Make sure to use the latest octotiger, a couple of the command line options above require the latest version.On the SuperMIC computer, with 32 20 node cores, the 8 level problem takes 70 steps and runs in about a minute and a half. Stoptime can be adjusted for longer or shorter runs.
To run octotiger with APEX:
salloc -N 1 -p knl -C knl,quad,flat -t 30:00 -A xpress
# (wait for allocation...)
rm -rf *.chk
rm -rf step.dat
rm -rf OTF2_archive
export APEX_OTF2=1 # enables OTF2 output
export APEX_PROC_STAT=0 # disables background checking of system counters (there's a bug Kevin needs to fix)
export APEX_PROCESS_ASYNC_STATE=0 # reduces overhead of APEX from ~1.5% to even less than that
srun -n 1 -N 1 ./src/octotiger-build/octotiger -Disableoutput -Problem=moving_star -Max_level=4 -Xscale=32 -Odt=0.5 -Stopstep=0 --hpx:threads=68
If APEX insn't enabled:
salloc -N 1 -p knl -C knl,quad,flat -t 30:00 -A xpress
# (wait for allocation...)
srun -n 1 -N 1 ./src/octotiger-build/octotiger -Disableoutput -Problem=moving_star -Max_level=4 -Xscale=32 -Odt=0.5 -Stopstep=0 --hpx:threads=68
Notice that to control the MCDRAM/DRAM of the KNLs, you have to use numactl
:
salloc -N 1 -p knl -C knl,quad,flat -t 30:00 -A xpress
# (wait for allocation...)
srun -n 1 -N 1 numactl -m 1 ./src/octotiger-build/octotiger -Disableoutput -Problem=moving_star -Max_level=4 -Xscale=32 -Odt=0.5 -Stopstep=0 --hpx:threads=68
-m 1
indicates that the MCDRAM should be used. For more, see the NERSC website.
If the error message says something with insufficient resources, similar to this:
what(): mmap() failed to allocate thread stack due to insufficient resources, increase /proc/sys/vm/max_map_count or add -Ihpx.stacks.use_guard_pages=0 to the command line: HPX(unhandled_exception)
Then add the flag -Ihpx.stacks.use_guard_pages=0
.
On some machines this seems to lead to some performance degradation. See whether the function pageblock_pfn_to_page
shows up in a profiler.
If octotiger crashes with a segfault, it is possible that the stack space is too small. Increase the stack size by adding the parameter --hpx:ini=hpx.stacks.small_size=0xC0000
(from C000
bytes).
HPX can be integrated with the Intel VTune Amplifier and Intel Inspector tools through the open ITTNotify interface exposed by those tools. This exposes the following information to the Intel tools:
- Setting kernel thread names to be displayed in VTune
- HPX-threads, HPX performance counters, and HPX Parcel send/receive events
- HPX specific synchronization primitives
- HPX memory allocation tracking
This integration requires both, compile-time and run-time settings to be enabled. Add -DHPX_WITH_ITTNOTIFY=On
and -DAMPLIFIER_ROOT=<amplifier base directory>
to cmake at configuration time to compile in the functionality into the HPX core library. This can be done, even if no (runtime-) integration with the Intel tools is planned and will create no additional runtime overheads (unless you actually run using one of the Intel tools).
Note however that you can't have both, integrating HPX with the Intel tools and with APEX at the same time. Only one of the integrations is possible for a particular build of HPX. If you use both options (-DHPX_WITH_ITTNOTIFY=On
and -DHPX_WITH_APEX=On
) you will still get only the integration with APEX as it integrates with HPX in part through the ITTNotify interface itself.
In order to activate the integration with the Intel tools at runtime add the command line option --hpx:ini=hpx.use_itt_notify=1
.