-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
surface_fraction error with 2 GPUs #687
Comments
it seems like this is related to CliMA/ClimaAtmos.jl#2993 |
consider #807 - for a coupler solution we could consider using the edit: we still have the same issue when using |
FYI I got the same error with 8 or 12 GPUs (on derecho), but not with 16 GPUs. |
Replacing the assert statements with @assert minimum((ice_fraction .+ land_fraction .+ ocean_fraction) |> parent) ≈ FT(1)
@assert maximum((ice_fraction .+ land_fraction .+ ocean_fraction)) ≈ FT(1) fix the problem. Related issue: here The problem should be fixed once this issue in ClimaCore is fixed. |
When we run simulations on 2 GPUs, we get an error that the surface fractions don't sum up to 1. This doesn't happen when running on 1 or 4 GPUs. See examples on buildkite here or here. Seen both when run on buildkite and when run via slurm directly on clima
stacktrace:
when run interactively, we see that
minimum((cs.surface_fractions.ice .+ cs.surface_fractions.land) .+ cs.surface_fractions.ocean) == 0
. We expect this sum to always be exactly 1 at each point on the sphere.The text was updated successfully, but these errors were encountered: