-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance outline #635
Comments
821: make GC deterministic in distributed r=simonbyrne a=simonbyrne # PULL REQUEST ## Purpose and Content This should reduce MPI Waitall time by manually triggering the GC across all processes at the same time. ## Benefits and Risks The number of steps will require some tuning to avoid out-of-memory errors ## Linked Issues - Item 3 of #635 - Mentioned in #686 - Supersedes #687 ## PR Checklist - [x] This PR has a corresponding issue OR is linked to an SDI. - [x] I have followed CliMA's codebase [contribution](https://clima.github.io/ClimateMachine.jl/latest/Contributing/) and [style](https://clima.github.io/ClimateMachine.jl/latest/DevDocs/CodeStyle/) guidelines OR N/A. - [x] I have followed CliMA's [documentation policy](https://github.com/CliMA/policies/wiki/Documentation-Policy). - [x] I have checked all issues and PRs and I certify that this PR does not duplicate an open PR. - [x] I linted my code on my local machine prior to submission OR N/A. - [x] Unit tests are included OR N/A. - [x] Code used in an integration test OR N/A. - [x] All tests ran successfully on my local machine OR N/A. - [x] All classes, modules, and function contain docstrings OR N/A. - [x] Documentation has been added/updated OR N/A. Co-authored-by: Simon Byrne <[email protected]>
Superseded by #2632 |
|
Key items to tackle
bycolumn
: this is fairly low hanging fruit and also makes threading more efficient. Most of this is complete, however,non_orographic_gravity_wave_tendency!
andorographic_gravity_wave_tendency!
need to be reworked. Opened Improve design of nonorographic gravity wave parameterization #897 to track.GC.enable(false)
andGC.gc()
to trigger manually*
over/
if not being optimizedW = -I + dtγ * J
from thelinsolve!
function (which solves the equationW * newton_residual = ΔY
) to theWfact!
function (which computesW
). This will speed things up if we take multiple Newton iterations during the implicit step of a Runge-Kutta stage without re-computingW
for each iteration, or if we only computeW
once per timestep and hold it fixed for all the Runge-Kutta stages. Factorization may be the most expensive part of eitherWfact!
orlinsolve!
, so minimizing the number of factorizations could potentially give us a significant performance improvement. This is not a pure optimization, though, since it's behavior changing, has built-in assumptions about source terms, and can impact stability.Performance notes / specs
The text was updated successfully, but these errors were encountered: