-
Notifications
You must be signed in to change notification settings - Fork 631
OpenMP Notes
Starting with the release of FDS 6.1.0 the default version of FDS includes OpenMP parallelization. Unlike MPI parallelization, OpenMP does not require you to split up the computational domain into individual meshes. But since OpenMP is a shared memory parallelization, it is limited to the resources of one machine, whereas MPI can take advantage of multiple machines connected over a network.
By default, an OpenMP version of FDS should use all of the available processors or cores on a given machine. The number of available "threads" is indicated by FDS at the start of the run. You can just type the name of the executable if you want to see how many threads are available.
Most processors today offer virtual threads or so called hyperthreading/SMT. So far all benchmarks performed have shown that hyperthreading is detrimental to OpenMP performance in FDS.
The degree of parallelization increases with larger cell counts. So larger simulations will see a greater speedup. But at some point the performance will top out, on a dual socket Xeon X5570 this occurred somewhere between 0.5 and 2 million cells. Depending on cache sizes, memory bandwidths etc. this may be different for individual users.
The degree of parallelization lies somewhere between 40 and 80 percent. According to Ahmdahl's law you will see a stark decrease in the return of investment as you add more threads. In most cases your computational efficiency (speedup/threads) will drop below 50 percent once you pass four threads. If you can run two simulations at the same time with four threads each instead of one with eight threads you will be making better use of your power bill.
When using MPI parallelization you can also use OpenMP. Here you will want to limit the number of threads used by each MPI process. With P as the number of MPI processes launched per machine, T as the number of threads per MPI process and C as the number of physical cores of your machine, choose T such that: P*T=C.
Parallelization with MPI will always deliver greater speedups than OpenMP given the same number of cores to run on. So if you can safely use MPI (and still obtain valid results) you should do so. If you have additional computational resources you can add OpenMP parallelization to speed things up further.
To summarize:
- MPI will usually give you a greater speedup
- expect a speedup of two when using four threads
- beyond four threads you won't see much improvement
- don't use hyperthreading, it slows things down
To limit the number of threads, you need to set an environment variable OMP_NUM_THREADS
. See below how this works on Linux and Windows.
For Linux, to limit the number of threads to, say, 2, enter
export OMP_NUM_THREADS=2
Note that this only affects the given session. If you want to create a default, enter this command in the start up script.
For Windows, to limit the number of threads you have to create a new environment variable called OMP_NUM_THREADS
. After saving the variable you have to restart your command line environment (normally no reboot is necessary). For a given session, you can just enter
set OMP_NUM_THREADS=2
To run the OpenMP version of FDS, you usually have to allocate a certain amount of memory (RAM) to be used by the program. On a Windows computer, go to "System Properties", then "Advanced", then "Environment Variables." Add the new system variable OMP_STACKSIZE
with the value of 16M. If FDS-OpenMP does not work, use a higher value for OMP_STACKSIZE
(200M seems to be a good value). You can also adjust the OMP_STACKSIZE
by typing
set OMP_STACKSIZE=16M
(for 16M) on your Windows command line before you start FDS.
If Windows (64-bit System) reports error messages like
- OMP: Error #136: Cannot create thread.
- OMP: System error #8: Not enough storage is available to process this command.
try to reduce your
OMP_STACKSIZE
value if it is "large" (e. g. 1G). This has solved the problem for some tests.
For those who have purchased the Intel Thread Checker, scripts to perform OpenMP thread checking, inspect_openmp.sh
, and to report results, inspect_report.sh
, are located in the FDS-SMV repository in the Utilities/Scripts directory.
On the blaze and burn linux clusters at NIST, these commands are also located in your user path. To perform thread checking on the input file casename.fds, type:
inspect_openmp.sh casename.fds
This command should only be performed on cases that run for a VERY short time as the inspection process takes a long time. To output results from the thread checker, type
inspect_report.sh
Type inspect_openmp.sh -h
or inspect_report -h
to output usage information.