Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GPU AMIP scaling runs #673

Closed
wants to merge 28 commits into from
Closed

add GPU AMIP scaling runs #673

wants to merge 28 commits into from

Conversation

juliasloan25
Copy link
Member

@juliasloan25 juliasloan25 commented Mar 6, 2024

Purpose

closes #663

note: we want to use the slurm_exclusive: flag to get a GPU that's being used for only the performance job, but this seems not to be working when used in Atmos

Need to use ClimaAtmos#fdf1df4 commit to include PR containing atmos config files. Can't use main because of dependency incompatibilities.

Content

  • add GPU performance buildkite pipeline
  • add SYPD calculation to driver (use calculation from atmos so we're consistent)
  • add configs for 1-, 2-, and 4-GPU AMIP runs based on ClimaAtmos chap runs
    • do for both strong and weak scaling
  • add runs in buildkite/gpu/pipeline.yml
  • manually trigger pipeline to check that these runs succeed on clima

Status

2 GPU error is preventing us from getting scaling results - see #687. This needs to be addressed before we can set up reliable scaling runs


  • I have read and checked the items on the review checklist.

@Sbozzolo
Copy link
Member

Sbozzolo commented Mar 7, 2024

See here:

CliMA/ClimaAtmos.jl@448d22c

-  SLURM_GPU_BIND: none # https://github.com/open-mpi/ompi/issues/11949#issuecomment-1737712291
+  SLRUM_GRES_FLAGS: "allow-task-sharing"

@juliasloan25 juliasloan25 force-pushed the js/gpu-scaling branch 2 times, most recently from e8da583 to c357da3 Compare March 8, 2024 23:39
@@ -507,6 +507,9 @@ function update_surface_fractions!(cs::CoupledSimulation)
cs.surface_fractions.ice .= max.(min.(ice_d, FT(1) .- land_s), FT(0))
cs.surface_fractions.ocean .= max.(FT(1) .- (cs.surface_fractions.ice .+ land_s), FT(0))

comms_ctx = axes(land_s).grid.topology.context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
comms_ctx = axes(land_s).grid.topology.context
comms_ctx = ClimaComms.context(land_s)

I think we have a helper for this, so that you don't need to use internals.

key: "gpu_amip_dyamond_ws"
command:
- >
julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an nsys profile to the non-MPI jobs?

@juliasloan25 juliasloan25 mentioned this pull request Mar 20, 2024
14 tasks
@juliasloan25
Copy link
Member Author

CHAP is no longer our target; we want to set up scaling runs using the benchmarks pipeline and setups in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add GPU AMIP scaling runs
3 participants