-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add GPU AMIP scaling runs #673
Closed
Closed
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
41dff6f
add strong scaling GPU AMIP
juliasloan25 3d8f4bb
add weak scaling
juliasloan25 e9b769c
use atmos branch
juliasloan25 9fec265
don't use vert_diff: true [skip ci]
juliasloan25 416b053
use correct driver [skip ci]
juliasloan25 9d69913
wait after each job [skip ci]
juliasloan25 50c45cc
weak scaling only [skip ci]
juliasloan25 9e0df96
add barrier
juliasloan25 9d994e4
strong scaling only [skip ci]
juliasloan25 686f44c
no waits; update ws resolutions [skip ci]
juliasloan25 33811a6
decrease ws 1 GPU dt [skip ci]
juliasloan25 91be5d3
show surface fractions [skip ci]
juliasloan25 7b521db
show surface fraction sums [skip ci]
juliasloan25 ed2768d
more barrier [skip ci]
juliasloan25 0b22c9b
add scaling plot [skip ci]
juliasloan25 1f675c1
ws 1 GPU h_elem 30 [skip ci]
juliasloan25 9031045
2 gpu ss sum before max [skip ci]
juliasloan25 1d9206f
strong scaling only h_elem 30 [skip ci]
juliasloan25 885737d
strong scaling h_elem 60, dt 50
juliasloan25 f77dfab
weak scaling h_elem 84 dt 50
juliasloan25 dbbdfee
ss 1 gpu @ 60, 4 gpu @ 42
juliasloan25 c64b442
dyamond strong scaling [skip ci]
juliasloan25 6c22046
DYAMOND ws [skip ci]
juliasloan25 4a21815
dyamond ws higher res [skip ci]
juliasloan25 f963c31
fix pipeline [skip ci]
juliasloan25 55c6dc1
include 1 GPU [skip ci]
juliasloan25 8e68e7f
dyamond ws helem 30,42,60 [skip ci]
juliasloan25 10ac98e
run for 1 day [skip ci]
juliasloan25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,222 @@ | ||
agents: | ||
queue: clima | ||
slurm_mem: 8G | ||
modules: common nsight-systems/2023.4.1 | ||
|
||
env: | ||
JULIA_CUDA_MEMORY_POOL: none | ||
JULIA_MPI_HAS_CUDA: "true" | ||
JULIA_NVTX_CALLBACKS: gc | ||
JULIA_MAX_NUM_PRECOMPILE_FILES: 100 | ||
OPENBLAS_NUM_THREADS: 1 | ||
OMPI_MCA_opal_warn_on_missing_libcuda: 0 | ||
SLURM_KILL_BAD_EXIT: 1 | ||
SLURM_GRES_FLAGS: "allow-task-sharing" | ||
GPU_CONFIG_PATH: "config/gpu_configs" | ||
GPU_DYAMOND_CONFIG_PATH: "config/gpu_configs/gpu_dyamond" | ||
GPU_DYAMOND_WS_CONFIG_PATH: "config/gpu_configs/gpu_dyamond_ws" | ||
CLIMAATMOS_GC_NSTEPS: 10 | ||
|
||
steps: | ||
- label: "init :GPU:" | ||
key: "init_gpu_env" | ||
command: | ||
- echo "--- Instantiate experiments/AMIP" | ||
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.instantiate(;verbose=true)' | ||
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.precompile()' | ||
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.status()' | ||
|
||
- echo "--- Download artifacts" | ||
- "julia --project=artifacts -e 'using Pkg; Pkg.instantiate(;verbose=true)'" | ||
- "julia --project=artifacts -e 'using Pkg; Pkg.precompile()'" | ||
- "julia --project=artifacts -e 'using Pkg; Pkg.status()'" | ||
- "julia --project=artifacts artifacts/download_artifacts.jl" | ||
|
||
agents: | ||
slurm_gpus: 1 | ||
slurm_cpus_per_task: 8 | ||
env: | ||
JULIA_NUM_PRECOMPILE_TASKS: 8 | ||
JULIA_MAX_NUM_PRECOMPILE_FILES: 50 | ||
|
||
- wait | ||
|
||
# - group: "DYAMOND GPU strong scaling" | ||
# steps: | ||
|
||
# - label: "GPU AMIP DYAMOND - strong scaling - 1 GPU" | ||
# key: "gpu_amip_dyamond" | ||
# command: | ||
# - > | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_DYAMOND_CONFIG_PATH/gpu_amip_dyamond.yml | ||
# artifact_paths: "gpu_amip_dyamond/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 1 | ||
# slurm_mem: 32G | ||
|
||
# - label: "GPU AMIP DYAMOND - strong scaling - 2 GPUs" | ||
# key: "gpu_amip_dyamond_2process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_DYAMOND_CONFIG_PATH/gpu_amip_dyamond_2process.yml | ||
# artifact_paths: "gpu_amip_dyamond_2process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 2 | ||
# slurm_mem: 32G | ||
|
||
# - label: "GPU AMIP DYAMOND - strong scaling - 4 GPUs" | ||
# key: "gpu_amip_dyamond_4process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_DYAMOND_CONFIG_PATH/gpu_amip_dyamond_4process.yml | ||
# artifact_paths: "gpu_amip_dyamond_4process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 4 | ||
# slurm_mem: 32G | ||
|
||
- group: "DYAMOND GPU weak scaling" | ||
steps: | ||
|
||
- label: "GPU AMIP DYAMOND - weak scaling - 1 GPU" | ||
key: "gpu_amip_dyamond_ws" | ||
command: | ||
- > | ||
julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
--config_file $GPU_DYAMOND_WS_CONFIG_PATH/gpu_amip_dyamond_ws.yml | ||
artifact_paths: "gpu_amip_dyamond_ws/*" | ||
agents: | ||
slurm_gpus_per_task: 1 | ||
slurm_cpus_per_task: 4 | ||
slurm_ntasks: 1 | ||
slurm_mem: 32G | ||
|
||
- label: "GPU AMIP DYAMOND - weak scaling - 2 GPUs" | ||
key: "gpu_amip_dyamond_ws_2process" | ||
command: | ||
- > | ||
srun --cpu-bind=threads --cpus-per-task=4 | ||
julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
--config_file $GPU_DYAMOND_WS_CONFIG_PATH/gpu_amip_dyamond_ws_2process.yml | ||
artifact_paths: "gpu_amip_dyamond_ws_2process/*" | ||
agents: | ||
slurm_gpus_per_task: 1 | ||
slurm_cpus_per_task: 4 | ||
slurm_ntasks: 2 | ||
slurm_mem: 32G | ||
|
||
- label: "GPU AMIP DYAMOND - weak scaling - 4 GPUs" | ||
key: "gpu_amip_dyamond_ws_4process" | ||
command: | ||
- > | ||
srun --cpu-bind=threads --cpus-per-task=4 | ||
julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
--config_file $GPU_DYAMOND_WS_CONFIG_PATH/gpu_amip_dyamond_ws_4process.yml | ||
artifact_paths: "gpu_amip_dyamond_ws_4process/*" | ||
agents: | ||
slurm_gpus_per_task: 1 | ||
slurm_cpus_per_task: 4 | ||
slurm_ntasks: 4 | ||
slurm_mem: 32G | ||
|
||
# - group: "CHAP GPU strong scaling" | ||
# steps: | ||
|
||
# - label: "GPU AMIP CHAP - strong scaling - 1 GPU" | ||
# key: "gpu_amip_chap" | ||
# command: | ||
# - > | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_CONFIG_PATH/gpu_amip_chap.yml | ||
# artifact_paths: "gpu_amip_chap/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 1 | ||
# slurm_mem: 32G | ||
|
||
# - label: "GPU AMIP CHAP - strong scaling - 2 GPUs" | ||
# key: "gpu_amip_chap_2process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_CONFIG_PATH/gpu_amip_chap_2process.yml | ||
# artifact_paths: "gpu_amip_chap_2process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 2 | ||
# slurm_mem: 32G | ||
|
||
# - label: "GPU AMIP CHAP - strong scaling - 4 GPUs" | ||
# key: "gpu_amip_chap_4process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_CONFIG_PATH/gpu_amip_chap_4process.yml | ||
# artifact_paths: "gpu_amip_chap_4process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 4 | ||
# slurm_mem: 32G | ||
|
||
# - group: "CHAP GPU weak scaling" | ||
# steps: | ||
|
||
# - label: "GPU AMIP CHAP - weak scaling - 1 GPU" | ||
# key: "gpu_amip_chap_ws" | ||
# command: | ||
# - > | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl --config_file $GPU_CONFIG_PATH/gpu_amip_chap_ws.yml | ||
# artifact_paths: "gpu_amip_chap_ws/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 1 | ||
# slurm_mem: 32G | ||
# slurm_exclusive: | ||
|
||
# - label: "GPU AMIP CHAP - weak scaling - 2 GPUs" | ||
# key: "gpu_amip_chap_ws_2process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_CONFIG_PATH/gpu_amip_chap_ws_2process.yml | ||
# artifact_paths: "gpu_amip_chap_ws_2process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 2 | ||
# slurm_mem: 32G | ||
# slurm_time: 8:00:00 | ||
# slurm_exclusive: | ||
|
||
# - label: "GPU AMIP CHAP - weak scaling - 4 GPUs" | ||
# key: "gpu_amip_chap_ws_4process" | ||
# command: | ||
# - > | ||
# srun --cpu-bind=threads --cpus-per-task=4 | ||
# julia --threads=3 --color=yes --project=experiments/AMIP experiments/AMIP/coupler_driver.jl | ||
# --config_file $GPU_CONFIG_PATH/gpu_amip_chap_ws_4process.yml | ||
# artifact_paths: "gpu_amip_chap_ws_4process/*" | ||
# agents: | ||
# slurm_gpus_per_task: 1 | ||
# slurm_cpus_per_task: 4 | ||
# slurm_ntasks: 4 | ||
# slurm_mem: 32G | ||
# slurm_time: 8:00:00 | ||
# slurm_exclusive: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap.yml" | ||
dt: "50secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 50 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 60 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap_2process.yml" | ||
dt: "50secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 50 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 60 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap_2process" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap_2process" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap_4process.yml" | ||
dt: "50secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 50 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 42 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap_4process" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap_4process" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap_ws_1process.yml" | ||
dt: "100secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 100 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 30 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap_ws" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap_ws" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap_ws_2process.yml" | ||
dt: "50secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 50 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 60 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap_ws_2process" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap_ws_2process" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_chap_ws_4process.yml" | ||
dt: "50secs" | ||
dt_cloud_fraction: "1hours" | ||
dt_cpl: 50 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 84 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_chap_ws_4process" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_chap_ws_4process" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "1days" | ||
turb_flux_partition: "CombinedStateFluxes" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
anim: false | ||
apply_limiter: false | ||
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_dyamond.yml" | ||
dt: "100secs" | ||
dt_cpl: 100 | ||
dt_rad: "1hours" | ||
dt_save_state_to_disk: "Inf" | ||
dt_save_to_sol: "Inf" | ||
energy_check: false | ||
evolving_ocean: false | ||
h_elem: 30 | ||
hourly_checkpoint: false | ||
job_id: "gpu_amip_dyamond" | ||
land_albedo_type: "map_static" | ||
mode_name: "amip" | ||
mono_surface: false | ||
run_name: "gpu_amip_dyamond" | ||
start_date: "19790301" | ||
surface_setup: "PrescribedSurface" | ||
t_end: "12hours" | ||
turb_flux_partition: "CombinedStateFluxes" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an
nsys profile
to the non-MPI jobs?