Investigate if scripts/consequences-v3.10.0.py could be optimized #57

anthonyfok · 2022-05-02T21:41:01Z

While casually observing a scripts/run_OQStandard.sh run, I noticed that OpenQuake itself would happily use all available CPU cores to do calculations in parallel (which is awesome), but some other processing are single-threaded and could take over 12 hours. For example:

from ps auxwww nearing the end of python3 scripts/consequences-v3.10.0.py -2 run:

user    2151  0.0  0.0   8756  3792 pts/0    S+   07:51   0:00 bash scripts/run_OQStandard.sh SCM6p5_Montreal_conv -h -r -d -o
user    2225  0.0  0.0 3065888 101008 ?      Sl   07:51   0:01 oq-dbserver
user    6603  100  0.0 2836080 263132 pts/0  Rl+  09:53 759:05 python3 scripts/consequences-v3.10.0.py -2

from top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   6603 user      20   0 2836080 263132  50196 R 100.0   0.0 286:58.05 python3

from free -h:

              total        used        free      shared  buff/cache   available
Mem:          749Gi       1.4Gi       739Gi       1.0Mi       8.6Gi       744Gi
Swap:            0B          0B          0B

So, in this particular case, calculations before python3 scripts/consequences-v3.10.0.py -2 took just over 2 hours, but python3 scripts/consequences-v3.10.0.py -2 alone ~~was approaching 5 hours~~ took 12.65 hours (759 minutes), running single-threaded (not using a lot of RAM) and writing to CSV files at about 200 lines/second (487,211 lines per CSV file in this scenario):

-rw-rw-r-- 1 user group 96764235 May  2 10:40 consequences-rlz-000_-2.csv
-rw-rw-r-- 1 user group 96262336 May  2 11:28 consequences-rlz-001_-2.csv
-rw-rw-r-- 1 user group 96978159 May  2 12:15 consequences-rlz-002_-2.csv
-rw-rw-r-- 1 user group 97646335 May  2 13:03 consequences-rlz-003_-2.csv
-rw-rw-r-- 1 user group 98016335 May  2 13:50 consequences-rlz-004_-2.csv
-rw-rw-r-- 1 user group 83709311 May  2 14:31 consequences-rlz-005_-2.csv

Ditto for the python3 scripts/consequences-v3.10.0.py -1 command which is expected to take another 12 hours.

Would be an interesting exercise to profile this script and see where it is spending most of its time, and find ways to make it speedier.

(Low priority, could have)

P.S. A quick-and-dirty script that I am using to record basic metrics:

#!/bin/bash
LOGFILE=~/logs/log_2022-05-02_cpu-ram-process.log
while true; do
  ( date; uptime; free -h; ps auxwww | grep ^user ; echo) | tee -a "${LOGFILE}"
  sleep 15
done

The text was updated successfully, but these errors were encountered:

Use Python multiprocessing package to take advantage of multiple CPU cores for processing multiple realizations simultaneously. This would reduce the total run time of, for example, bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance. Fixes OpenDRR#57

Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Supersedes Pull Request OpenDRR#58 Fixes OpenDRR#57

Taking advantage of multiple CPU cores, multiple python3 instances are dispatched simultaneously using "GNU parallel" in run_OQStandard.sh for consequences calculations. Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o" as example, with each realization taking 82 minutes, doing 16 realizations in parallel instead of in series would save 20.5 hours. As consequences calculations are done twice, the total run time is reduced by 41 hours, from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance. Unlike Python’s own multiprocessing module, GNU parallel’s invocation of multiple invocations of Python does not involve any memory sharing at all, which avoids any potential mysterious calculation discrepancy with Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58. Fixes OpenDRR#57

anthonyfok mentioned this issue May 5, 2022

Parallelize calculations in consequences-v3.10.0.py #58

Closed

anthonyfok linked a pull request May 11, 2022 that will close this issue

Use GNU parallel to run consequences processing in parallel #61

Open

anthonyfok self-assigned this Nov 2, 2023

anthonyfok added this to Planned in Data via automation Nov 2, 2023

anthonyfok added the Enhancement New feature or request label Nov 2, 2023

anthonyfok mentioned this issue Jan 9, 2024

Some notes on parallelizing Consequences calculations gem/oq-engine#9324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

anthonyfok commented May 2, 2022 •

edited

Loading

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

Comments

anthonyfok commented May 2, 2022 • edited Loading

anthonyfok commented May 2, 2022 •

edited

Loading