Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

Open
anthonyfok opened this issue May 2, 2022 · 0 comments · May be fixed by #61
Open

Investigate if scripts/consequences-v3.10.0.py could be optimized #57

anthonyfok opened this issue May 2, 2022 · 0 comments · May be fixed by #61
Assignees
Labels
Enhancement New feature or request
Projects

Comments

@anthonyfok
Copy link
Member

anthonyfok commented May 2, 2022

While casually observing a scripts/run_OQStandard.sh run, I noticed that OpenQuake itself would happily use all available CPU cores to do calculations in parallel (which is awesome), but some other processing are single-threaded and could take over 12 hours. For example:

from ps auxwww nearing the end of python3 scripts/consequences-v3.10.0.py -2 run:

user    2151  0.0  0.0   8756  3792 pts/0    S+   07:51   0:00 bash scripts/run_OQStandard.sh SCM6p5_Montreal_conv -h -r -d -o
user    2225  0.0  0.0 3065888 101008 ?      Sl   07:51   0:01 oq-dbserver
user    6603  100  0.0 2836080 263132 pts/0  Rl+  09:53 759:05 python3 scripts/consequences-v3.10.0.py -2

from top:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   6603 user      20   0 2836080 263132  50196 R 100.0   0.0 286:58.05 python3

from free -h:

              total        used        free      shared  buff/cache   available
Mem:          749Gi       1.4Gi       739Gi       1.0Mi       8.6Gi       744Gi
Swap:            0B          0B          0B

So, in this particular case, calculations before python3 scripts/consequences-v3.10.0.py -2 took just over 2 hours, but python3 scripts/consequences-v3.10.0.py -2 alone was approaching 5 hours took 12.65 hours (759 minutes), running single-threaded (not using a lot of RAM) and writing to CSV files at about 200 lines/second (487,211 lines per CSV file in this scenario):

-rw-rw-r-- 1 user group 96764235 May  2 10:40 consequences-rlz-000_-2.csv
-rw-rw-r-- 1 user group 96262336 May  2 11:28 consequences-rlz-001_-2.csv
-rw-rw-r-- 1 user group 96978159 May  2 12:15 consequences-rlz-002_-2.csv
-rw-rw-r-- 1 user group 97646335 May  2 13:03 consequences-rlz-003_-2.csv
-rw-rw-r-- 1 user group 98016335 May  2 13:50 consequences-rlz-004_-2.csv
-rw-rw-r-- 1 user group 83709311 May  2 14:31 consequences-rlz-005_-2.csv

Ditto for the python3 scripts/consequences-v3.10.0.py -1 command which is expected to take another 12 hours.

Would be an interesting exercise to profile this script and see where it is spending most of its time, and find ways to make it speedier.

(Low priority, could have)

P.S. A quick-and-dirty script that I am using to record basic metrics:

#!/bin/bash
LOGFILE=~/logs/log_2022-05-02_cpu-ram-process.log
while true; do
  ( date; uptime; free -h; ps auxwww | grep ^user ; echo) | tee -a "${LOGFILE}"
  sleep 15
done
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 5, 2022
Use Python multiprocessing package to take advantage of multiple CPU cores
for processing multiple realizations simultaneously.

This would reduce the total run time of, for example,

    bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o

from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 5, 2022
Use Python multiprocessing package to take advantage of multiple CPU cores
for processing multiple realizations simultaneously.

This would reduce the total run time of, for example,

    bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o

from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 5, 2022
Use Python multiprocessing package to take advantage of multiple CPU cores
for processing multiple realizations simultaneously.

This would reduce the total run time of, for example,

    bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o

from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 5, 2022
Use Python multiprocessing package to take advantage of multiple CPU cores
for processing multiple realizations simultaneously.

This would reduce the total run time of, for example,

    bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o

from 23 hours down to 6 hours on a c5a.24xlarge EC2 instance.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 11, 2022
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Supersedes Pull Request OpenDRR#58

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue May 11, 2022
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue Oct 27, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue Oct 31, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
anthonyfok added a commit to anthonyfok/earthquake-scenarios that referenced this issue Nov 2, 2023
Taking advantage of multiple CPU cores, multiple python3 instances are
dispatched simultaneously using "GNU parallel" in run_OQStandard.sh
for consequences calculations.

Using "bash scripts/run_OQStandard.sh SCM5p8_Montreal_conv -h -r -d -o"
as example, with each realization taking 82 minutes, doing 16 realizations
in parallel instead of in series would save 20.5 hours.  As consequences
calculations are done twice, the total run time is reduced by 41 hours,
from 56 hours down to 15 hours on a c5a.24xlarge EC2 instance.

Unlike Python’s own multiprocessing module, GNU parallel’s invocation of
multiple invocations of Python does not involve any memory sharing at all,
which avoids any potential mysterious calculation discrepancy with
Numpy’s OpenBLAS dot multiplications seen in superseded Pull Request OpenDRR#58.

Fixes OpenDRR#57
@anthonyfok anthonyfok self-assigned this Nov 2, 2023
@anthonyfok anthonyfok added this to Planned in Data via automation Nov 2, 2023
@anthonyfok anthonyfok added the Enhancement New feature or request label Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
Data
Planned
1 participant