Remove memory spike in Sampling #1235

marchdf · 2024-09-05T17:47:20Z

Summary

This removes the memory spike on the ioproc when doing sampling by initializing the sampling particles in parallel.

Pull request type

Please check the type of change introduced:

Checklist

The following is included:

new unit-test(s)
new regression test(s)
documentation for new capability

This PR was tested by running:

the unit tests
- on GPU
- on CPU
the regression tests
- on GPU
- on CPU

Additional background

Issue Number: #1186

marchdf · 2024-09-05T17:49:37Z

This is great except that it now all ranks need to know about the probe locations. So it ends up needing more memory in total though there is no spike. The next step that I want to do before this PR is merged is to have the ranks only need to know about probe locations that are relevant to it. Basically I want to pass box to sampling_locations and only get the locations that are inside with that box. But I need to switch to something else for a bit.

marchdf · 2024-09-11T21:49:56Z

Status update before I leave this aside for a bit:

there is currently are hard check that the number of particles added match what was expected to be added. This is a good check. But if the sampling location is not in the domain, then this will fail. Do I silently filter out sampling locations outside the domain? Or abort? This shows up in the dam_break_godunov test case.
Now that I can dynamically add sampling locations, I am going to have each rank only do the sampling locations that are in the boxes that it owns. That should further lower the memory consumption.

marchdf · 2024-09-20T21:11:53Z

Progress is being made... I need to fix the m_fill_val in radar sampler to be the min of the domain (checking to see if that's allowed). And then netcdf builds need to be fixed with the realvect change. And then finally I can call sampling_locations with the locally owned boxes. Phew.

marchdf · 2024-10-09T17:08:30Z

We are close. I added the box check which is the thing that should help the memory pressure. I need to check it on a large case and see if it actually works/causes no diffs. The only other thing I am "worried" about is the radar sampler since I don't have an input file for that.

marchdf · 2024-10-10T18:36:47Z

Ran the new reg test with MPI and everything on and I don't see any diffs in the particle files. I would like to double check netcdf. But here's the upshot:

successfully removes the memory spike: 10GB spike is gone and the max we see is about 250MB on each rank.
however the initialize of the particles is much slower with this PR. This is a bummer. Basically we are trading memory usage for computation. TANSTAAFL. One thing I want to think about is if there's a way to give it a bounding box...

This PR

Time spent in InitData():    106.9760745
Time spent in Evolve():      4.634056547

`main` branch

Time spent in InitData():    10.57266587
Time spent in Evolve():      4.56230204

marchdf · 2024-10-10T20:06:32Z

This is the diff with the big case I am using to benchmark (40M probes):

           absolute_error  relative_error
uid          0.000000e+00    0.000000e+00
set_id       0.000000e+00    0.000000e+00
probe_id     0.000000e+00    0.000000e+00
xco          1.104635e-08    1.092440e-16
yco          0.000000e+00    0.000000e+00
zco          0.000000e+00    0.000000e+00
velocityx    4.813321e-13    9.463192e-18
velocityy    7.431559e-13    1.052051e-15
velocityz    0.000000e+00    0.000000e+00

marchdf · 2024-10-10T22:56:55Z

Ok here's the updated plots with the improvements I made:

And now we are back to something reasonable:

Time spent in InitData():    16.04659739
Time spent in Evolve():      4.62255488

marchdf · 2024-10-10T22:57:27Z

Checklist:

check netcdf
rerun all the tests
check abl_sampling test and large test files from a GPU run
check free surface
I think I can actually simplify now that I did all that extra work

marchdf · 2024-10-11T14:51:05Z

Because I can't leave well enough alone... That last simplification means that now we are even faster than the main branch in the init (2X speedup). Turns out TANSTAAFL is not a thing.

Time spent in InitData():    5.056825252
Time spent in Evolve():      5.084482466

test/test_files/abl_sampling/abl_sampling.inp

mbkuhn

I'll approve now, though it looks like you have a few things you still want to do (and a good system for checking those). Excellent job on this!

mbkuhn · 2024-10-11T14:54:41Z

not sure if you've incorporated my additions to probe sampler, but hopefully those won't be too bad to address.

marchdf · 2024-10-11T15:00:58Z

not sure if you've incorporated my additions to probe sampler, but hopefully those won't be too bad to address.

Yup I merged those in when you made the changes. You had good tests in there that made it easy.

marchdf · 2024-10-11T15:12:53Z

That last one made the init 10% faster than the one before.

Time spent in InitData():    4.615414819
Time spent in Evolve():      5.122051878

Not enough to claim that the 80s were right about greed. I will keep it though.

marchdf · 2024-10-11T15:35:58Z

Free surface sampler looks fine.

plot_sampling_native.pdf

marchdf · 2024-10-11T16:21:26Z

abl_sampling_netcdf with 10 ranks on Kestrel:
This PR:

Time spent in InitData():    0.38666686
Time spent in Evolve():      1.077089947

main branch:

Time spent in InitData():    0.483688814
Time spent in Evolve():      1.074861178

Things "look" right:

❯ cd post_processing_dev
❯ du -shc *
28K     abl_statistics00000.nc
32K     line_sampling00000.nc
136K    plane_sampling00000.nc
32K     probe_sampling00000.nc
5.8M    volume_sampling00000.nc
88K     volume_sampling200000.nc
6.1M    total
❯ ..
❯ cd post_processing_main
❯ du -shc *
28K     abl_statistics00000.nc
32K     line_sampling00000.nc
136K    plane_sampling00000.nc
32K     probe_sampling00000.nc
5.8M    volume_sampling00000.nc
88K     volume_sampling200000.nc
6.1M    total

But I would like to have an nccmp output... which isn't on any machine I have access to yet.

marchdf · 2024-10-11T16:44:10Z

got nccmp added to kestrel:

❯ nccmp -d post_processing_dev/volume_sampling00000.nc post_processing_main/volume_sampling00000.nc
❯ nccmp -d post_processing_dev/volume_sampling200000.nc post_processing_main/volume_sampling200000.nc

good to go for netcdf

hgopalan · 2024-10-13T19:14:10Z

I checked one of our cases which crashed before on CPU in the first step. I changed the output format to native and it has already run fine for 100 seconds. Will keep you updated.

marchdf force-pushed the sampling-spike branch 5 times, most recently from ac8f6e4 to fc43780 Compare September 11, 2024 20:41

marchdf force-pushed the sampling-spike branch from fc43780 to 1f1a69a Compare September 12, 2024 16:04

818Nawaf approved these changes Sep 16, 2024

View reviewed changes

marchdf force-pushed the sampling-spike branch from 5f5c8ff to fd05e86 Compare September 17, 2024 23:17

818Nawaf approved these changes Sep 18, 2024

View reviewed changes

marchdf mentioned this pull request Sep 19, 2024

Add bounds checking and fixing for samplers #1255

Merged

11 tasks

marchdf force-pushed the sampling-spike branch 2 times, most recently from 5a54ba1 to c99cba2 Compare September 20, 2024 18:51

marchdf force-pushed the sampling-spike branch 2 times, most recently from 0ce7375 to d781798 Compare October 8, 2024 20:51

marchdf force-pushed the sampling-spike branch from c989db1 to 7a8436e Compare October 9, 2024 15:35

marchdf force-pushed the sampling-spike branch from 1655d25 to 171b62a Compare October 10, 2024 18:27

marchdf requested review from moprak-nrel and mbkuhn October 10, 2024 22:57

mbkuhn reviewed Oct 11, 2024

View reviewed changes

test/test_files/abl_sampling/abl_sampling.inp Show resolved Hide resolved

mbkuhn approved these changes Oct 11, 2024

View reviewed changes

moprak-nrel approved these changes Oct 11, 2024

View reviewed changes

marchdf force-pushed the sampling-spike branch from a13b9c5 to df9f910 Compare October 11, 2024 16:22

marchdf force-pushed the sampling-spike branch from df9f910 to da7a9f7 Compare October 11, 2024 20:39

marchdf force-pushed the sampling-spike branch 3 times, most recently from ebf5ccd to 0d55040 Compare October 16, 2024 15:38

Remove sampling spike

8eff850

marchdf force-pushed the sampling-spike branch from 0d55040 to 8eff850 Compare October 21, 2024 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove memory spike in Sampling #1235

Remove memory spike in Sampling #1235

marchdf commented Sep 5, 2024

marchdf commented Sep 5, 2024

marchdf commented Sep 11, 2024

marchdf commented Sep 20, 2024

marchdf commented Oct 9, 2024

marchdf commented Oct 10, 2024

marchdf commented Oct 10, 2024

marchdf commented Oct 10, 2024

marchdf commented Oct 10, 2024 •

edited

Loading

marchdf commented Oct 11, 2024

mbkuhn left a comment

mbkuhn commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

hgopalan commented Oct 13, 2024

Remove memory spike in Sampling #1235

Are you sure you want to change the base?

Remove memory spike in Sampling #1235

Conversation

marchdf commented Sep 5, 2024

Summary

Pull request type

Checklist

Additional background

marchdf commented Sep 5, 2024

marchdf commented Sep 11, 2024

marchdf commented Sep 20, 2024

marchdf commented Oct 9, 2024

marchdf commented Oct 10, 2024

This PR

main branch

marchdf commented Oct 10, 2024

marchdf commented Oct 10, 2024

marchdf commented Oct 10, 2024 • edited Loading

marchdf commented Oct 11, 2024

mbkuhn left a comment

Choose a reason for hiding this comment

mbkuhn commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

marchdf commented Oct 11, 2024

hgopalan commented Oct 13, 2024

`main` branch

marchdf commented Oct 10, 2024 •

edited

Loading