Benchmarks are needed #213

navidcy · 2020-09-20T00:20:07Z

Sometimes benchmarks will catch a bug that does not result in test failing but does result in considerable slowdown.

For example, playing around I figured that ETDRK4 time stepper is often faster than RK4.

julia> using FourierFlows, BenchmarkTools
[ Info: FourierFlows will use 12 threads

julia> prob_ForwardEuler = FourierFlows.Diffusion.Problem(stepper="ForwardEuler");

julia> prob_AB3 = FourierFlows.Diffusion.Problem(stepper="AB3");

julia> prob_RK4 = FourierFlows.Diffusion.Problem(stepper="RK4");

julia> prob_ETDRK4 = FourierFlows.Diffusion.Problem(stepper="ETDRK4");

julia> @btime stepforward!(prob_ForwardEuler, 1)
  243.669 ns (5 allocations: 304 bytes)

julia> @btime stepforward!(prob_AB3, 1)
  460.903 ns (5 allocations: 304 bytes)

julia> @btime stepforward!(prob_RK4, 1)
  1.362 μs (17 allocations: 1.05 KiB)

julia> @btime stepforward!(prob_ETDRK4, 1)
  1.306 μs (17 allocations: 1.05 KiB)

julia> using GeophysicalFlows

julia> prob_ForwardEuler = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="ForwardEuler");

julia> prob_AB3 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="AB3");

julia> prob_RK4 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="RK4");

julia> prob_ETDRK4 = GeophysicalFlows.TwoDNavierStokes.Problem(stepper="ETDRK4");

julia> @btime stepforward!(prob_ForwardEuler, 1)
  1.455 ms (989 allocations: 95.92 KiB)

julia> @btime stepforward!(prob_AB3, 1)
  1.420 ms (990 allocations: 95.95 KiB)

julia> @btime stepforward!(prob_RK4, 1)
  6.539 ms (3957 allocations: 383.67 KiB)

julia> @btime stepforward!(prob_ETDRK4, 1)
  4.606 ms (3957 allocations: 383.67 KiB)

I'm not sure if this is a bug or if this is indeed how it's supposed to be. But if it's the latter, then this would argue that you should always prefer ETDRK4 over RK4 when your timestep is fixed.

navidcy · 2020-09-20T00:37:03Z

I'm pretty sure that RK4 should be faster. Both RK4 and ETDRK4 involve 4 calls of calcN!...

glwagner · 2020-09-21T11:59:49Z

Benchmarks are definitely a good idea --- this script is probably good enough. It may be better to use the native Diffusion model for benchmarks of the timestepping methods?

The difference between RK4 and ETDRK4 is that linear terms are explicitly calculated in RK4. This may involve a few extra arithmetic operations that account for the 5% difference in timing? 5% may be close to the accuracy of the benchmark, by the way, so its hard to tell if this is a real difference. I don't think many users would notice this difference. I'm happy to see that they are within 5% and that memory consumption is low. This is a good result in my opinion.

A trickier question is whether multithreading / manually written kernels might speed up these time-stepping routines, and whether we should implement it via KernelAbstractions. This benchmark is a good start.

navidcy added enhancement question labels Sep 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks are needed #213

Benchmarks are needed #213

navidcy commented Sep 20, 2020 •

edited

Loading

navidcy commented Sep 20, 2020

glwagner commented Sep 21, 2020 •

edited

Loading

Benchmarks are needed #213

Benchmarks are needed #213

Comments

navidcy commented Sep 20, 2020 • edited Loading

navidcy commented Sep 20, 2020

glwagner commented Sep 21, 2020 • edited Loading

navidcy commented Sep 20, 2020 •

edited

Loading

glwagner commented Sep 21, 2020 •

edited

Loading