-
Notifications
You must be signed in to change notification settings - Fork 250
Update CI to Julia version to 1.12.0 #4836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I am very interested in this. Let's hope it works and we can move on from julia 1.10 |
|
I am disabling the reactant tests for the moment to check if the rest works. |
|
If docs still break on the |
|
Seems that we are hitting the same NaN issue on the internal tide example |
the ghosts of the past still haunt us.... |
|
Apparently also |
|
If I run the example locally, it works. Why would it error on CI? Do we have a way to reproduce this error locally? |
One thing to try might be to run the example locally and on CI using the exact same Manifest.toml if possible. We can commit a Manifest.toml to this branch for debugging. I can't think of which dependency would lead to such a big difference but it's one thing we can control for. |
|
From the Julia v1.11 chat I recall that the error was showing up only for unix, not for mac? |
|
With this environment (manifests for v1.11 and v1.12 both included) |
|
I can make the simulation error early with diff --git a/src/Diagnostics/nan_checker.jl b/src/Diagnostics/nan_checker.jl
index 57945c5dc..893a9e283 100644
--- a/src/Diagnostics/nan_checker.jl
+++ b/src/Diagnostics/nan_checker.jl
@@ -5,7 +5,7 @@ mutable struct NaNChecker{F}
erroring :: Bool
end
-NaNChecker(fields) = NaNChecker(fields, false) # default
+NaNChecker(fields) = NaNChecker(fields, true) # default
default_nan_checker(model) = nothing
function Base.summary(nc::NaNChecker)
@@ -28,7 +28,7 @@ a container with key-value pairs like a dictionary or `NamedTuple`.
If `erroring=true`, the `NaNChecker` will throw an error on NaN detection.
"""
-NaNChecker(; fields, erroring=false) = NaNChecker(fields, erroring)
+NaNChecker(; fields, erroring=true) = NaNChecker(fields, erroring)
hasnan(field::AbstractArray) = any(isnan, parent(field))
hasnan(model) = hasnan(first(fields(model)))I presume there's also a way to set the Can we use a callback to print out to file all the steps, so that we can compare 1:1 the progress on different machines? Presumably we're initially interested in the field |
|
Before Oceananigans.jl/examples/internal_tide.jl Line 175 in ea25179
julia> simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.281029, min=0.281029, mean=0.281029on both machines, if I'm looking at the right field and this display says enough about it, then they're the same at the beginning, but then on macOS I have julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.256 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info: ... simulation initialization complete (887.307 ms)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (128.489 ms).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.31715, min=0.265116, mean=0.280967
julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.335864, min=0.264486, mean=0.280859and on Ubuntu julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.391 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info: ... simulation initialization complete (1.130 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (20.645 ms).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.31715, min=0.265116, mean=0.280967
julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.333478, min=0.264645, mean=0.280863so there's a significant divergence already after two timesteps. Update: julia> time_step!(simulation); simulation.model.velocities.u
[ Info: Initializing simulation...
[ Info: Iter: 0, time: 0 seconds, wall time: 2.269 minutes, max|w|: 2.089e-03, m s⁻¹
[ Info: ... simulation initialization complete (11.788 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (12.640 seconds).
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.31715, min=0.265116, mean=0.280967
julia> time_step!(simulation); simulation.model.velocities.u
256×1×128 Field{Face, Center, Center} on ImmersedBoundaryGrid on CPU
├── grid: 256×1×128 ImmersedBoundaryGrid{Float64, Periodic, Flat, Bounded} on CPU with 4×0×4 halo
├── boundary conditions: FieldBoundaryConditions
│ └── west: Periodic, east: Periodic, south: Nothing, north: Nothing, bottom: ZeroFlux, top: ZeroFlux, immersed: ZeroFlux
└── data: 264×1×136 OffsetArray(::Array{Float64, 3}, -3:260, 1:1, -3:132) with eltype Float64 with indices -3:260×1:1×-3:132
└── max=0.335864, min=0.264486, mean=0.280859is also what I see on Ubuntu with Julia v1.10, which is consistent with all versions of Julia on macOS. |
|
The plot thickens: it works correctly in Julia v1.12 on Ampere eMAG (aarch64) with AlmaLinux 8.10 as operating system, which rules out an operating system difference. aarch64 is also the architecture on macOS, so I'm starting to suspect there's an architecture dependence. Can someone point me to the operation performed on the |
Nice work so far though!! The entire time-step is a complex chain of operations. I do think it is a good start to save down all fields every time-step. We may find that differences arise in one field versus another. Note that the NaNChecker checks |
|
To save every iteration chnage this line Oceananigans.jl/examples/internal_tide.jl Line 170 in ea25179
to The difference should arise in the very first time-step? We could compare those. It seems annoying laborious to do this across architectures, but maybe @giordano you have good ideas how to do this efficiently |
No description provided.