Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using PtrArrays #454

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

efaulhaber
Copy link
Member

@efaulhaber efaulhaber commented Mar 10, 2024

The main reason for this PR is the weird heisenbug in #453.
Threaded loops yield different results than non-threaded loops, even though each loop operation works on a different entry of dv. This problem disappears when using plain Julia Arrays instead of PtrArrays for dv.
I haven't yet managed to reduce #453 to a MWE, so I can't report this in StrideArrays.jl, so I suggest we use plain Julia arrays instead (for now at least).

Also, with plain Julia arrays, we don't run into problems like JuliaSIMD/StrideArrays.jl#88.

Since the reason we used PtrArrays in the first place is to avoid allocations, this PR is highly performance relevant.
As expected, we have some more allocations:
main:

julia> @btime TrixiParticles.wrap_u($u_ode, $fluid_system, $semi);
  7.458 ns (0 allocations: 0 bytes)

This PR:

julia> @btime TrixiParticles.wrap_u($u_ode, $fluid_system, $semi);
  28.602 ns (2 allocations: 80 bytes)

Here are some benchmarks with the dam break 2D example:
Apple M2 Pro
1 thread main:

julia> @btime TrixiParticles.interact!($dv_ode, $v_ode, $u_ode, $fluid_system, $fluid_system, $semi);
  6.422 ms (0 allocations: 0 bytes)

1 thread this PR:

julia> @btime TrixiParticles.interact!($dv_ode, $v_ode, $u_ode, $fluid_system, $fluid_system, $semi);
  6.566 ms (10 allocations: 400 bytes)

6 threads main:

julia> @btime TrixiParticles.kick!($dv_ode, $v_ode, $u_ode, $semi, $0.0);
  1.867 ms (38 allocations: 23.66 KiB)

julia> @btime TrixiParticles.interact!($dv_ode, $v_ode, $u_ode, $fluid_system, $fluid_system, $semi);
  1.374 ms (1 allocation: 496 bytes)

6 threads this PR:

julia> @btime TrixiParticles.kick!($dv_ode, $v_ode, $u_ode, $semi, $0.0);
  1.827 ms (113 allocations: 26.79 KiB)

julia> @btime TrixiParticles.interact!($dv_ode, $v_ode, $u_ode, $fluid_system, $fluid_system, $semi);
  1.354 ms (11 allocations: 896 bytes)

So we do have a little more allocations, but the performance difference in full simulations is actually negligible.
I'd also like to hear @sloede's or @ranocha's feedback here, since PtrArrays are also used in Trixi.

Here are some full timer outputs:
Ryzen Threadripper 3990X
1 thread main:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 66.5s /  99.5%            630MiB /  99.3%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    62.6s   94.6%  7.24ms    173MiB   27.7%  20.5KiB
   system interaction             8.65k    57.5s   86.9%  6.65ms   11.3MiB    1.8%  1.34KiB
     fluid1-fluid1                8.65k    51.4s   77.7%  5.94ms     0.00B    0.0%    0.00B
     fluid1-boundary2             8.65k    6.08s    9.2%   703μs     0.00B    0.0%    0.00B
     ~system interaction~         8.65k   43.1ms    0.1%  4.99μs   11.3MiB    1.8%  1.34KiB
     boundary2-boundary2          8.65k    211μs    0.0%  24.4ns     0.00B    0.0%    0.00B
     boundary2-fluid1             8.65k    207μs    0.0%  23.9ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    5.03s    7.6%   581μs    162MiB   25.9%  19.2KiB
     compute boundary pressure    8.65k    3.28s    5.0%   380μs     0.00B    0.0%    0.00B
     update nhs                   8.65k    1.07s    1.6%   124μs    162MiB   25.9%  19.2KiB
     ~update systems and nhs~     8.65k    361ms    0.5%  41.7μs   2.94KiB    0.0%    0.35B
     inverse state equation       8.65k    309ms    0.5%  35.8μs     0.00B    0.0%    0.00B
     update density diffusion     8.65k    196μs    0.0%  22.7ns     0.00B    0.0%    0.00B
   source terms                   8.65k   18.6ms    0.0%  2.15μs     0.00B    0.0%    0.00B
   reset ∂v/∂t                    8.65k   10.5ms    0.0%  1.21μs     0.00B    0.0%    0.00B
   ~kick!~                        8.65k   3.15ms    0.0%   364ns   2.94KiB    0.0%    0.35B
 save solution                       91    3.56s    5.4%  39.1ms    452MiB   72.3%  4.97MiB
   ~save solution~                   91    3.51s    5.3%  38.6ms    450MiB   72.0%  4.95MiB
   compute boundary pressure         91   34.1ms    0.1%   375μs     0.00B    0.0%    0.00B
   update nhs                        91   9.57ms    0.0%   105μs   1.70MiB    0.3%  19.2KiB
   inverse state equation            91   3.21ms    0.0%  35.3μs     0.00B    0.0%    0.00B
   update density diffusion          91   3.37μs    0.0%  37.0ns     0.00B    0.0%    0.00B
 drift!                           8.65k   34.1ms    0.1%  3.95μs   1.47KiB    0.0%    0.17B
   velocity                       8.65k   24.2ms    0.0%  2.79μs     0.00B    0.0%    0.00B
   reset ∂u/∂t                    8.65k   8.38ms    0.0%   969ns     0.00B    0.0%    0.00B
   ~drift!~                       8.65k   1.59ms    0.0%   184ns   1.47KiB    0.0%    0.17B
 compute boundary pressure            1    271μs    0.0%   271μs     0.00B    0.0%    0.00B
 update nhs                           1    106μs    0.0%   106μs   18.4KiB    0.0%  18.4KiB
 inverse state equation               1   22.4μs    0.0%  22.4μs     0.00B    0.0%    0.00B
 calculate dt                         1    210ns    0.0%   210ns     0.00B    0.0%    0.00B
 update density diffusion             1   20.0ns    0.0%  20.0ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

1 thread this PR:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 64.4s /  99.7%            710MiB / 100.0%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    60.6s   94.3%  7.00ms    200MiB   28.2%  23.7KiB
   system interaction             8.65k    55.4s   86.2%  6.40ms   21.8MiB    3.1%  2.59KiB
     fluid1-fluid1                8.65k    49.1s   76.5%  5.68ms     0.00B    0.0%    0.00B
     fluid1-boundary2             8.65k    6.18s    9.6%   714μs     0.00B    0.0%    0.00B
     ~system interaction~         8.65k   36.4ms    0.1%  4.21μs   21.8MiB    3.1%  2.59KiB
     boundary2-boundary2          8.65k    226μs    0.0%  26.1ns     0.00B    0.0%    0.00B
     boundary2-fluid1             8.65k    203μs    0.0%  23.4ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    5.16s    8.0%   596μs    175MiB   24.6%  20.7KiB
     compute boundary pressure    8.65k    3.34s    5.2%   386μs   2.11MiB    0.3%     256B
     update nhs                   8.65k    1.10s    1.7%   127μs    164MiB   23.1%  19.5KiB
     ~update systems and nhs~     8.65k    414ms    0.6%  47.9μs   8.45MiB    1.2%  1.00KiB
     inverse state equation       8.65k    310ms    0.5%  35.8μs     0.00B    0.0%    0.00B
     update density diffusion     8.65k    204μs    0.0%  23.6ns     0.00B    0.0%    0.00B
   source terms                   8.65k   36.6ms    0.1%  4.23μs   3.17MiB    0.4%     384B
   reset ∂v/∂t                    8.65k   11.2ms    0.0%  1.30μs     0.00B    0.0%    0.00B
   ~kick!~                        8.65k   3.20ms    0.0%   370ns   2.94KiB    0.0%    0.35B
 save solution                       91    3.56s    5.5%  39.1ms    508MiB   71.5%  5.58MiB
   ~save solution~                   91    3.51s    5.5%  38.5ms    506MiB   71.3%  5.56MiB
   compute boundary pressure         91   41.1ms    0.1%   451μs   22.8KiB    0.0%     256B
   update nhs                        91   9.70ms    0.0%   107μs   1.72MiB    0.2%  19.4KiB
   inverse state equation            91   3.21ms    0.0%  35.3μs     0.00B    0.0%    0.00B
   update density diffusion          91   6.72μs    0.0%  73.8ns     0.00B    0.0%    0.00B
 drift!                           8.65k    106ms    0.2%  12.2μs   2.11MiB    0.3%     256B
   velocity                       8.65k   95.7ms    0.1%  11.1μs   2.11MiB    0.3%     256B
   reset ∂u/∂t                    8.65k   8.34ms    0.0%   965ns     0.00B    0.0%    0.00B
   ~drift!~                       8.65k   1.58ms    0.0%   183ns   1.47KiB    0.0%    0.17B
 compute boundary pressure            1    274μs    0.0%   274μs      256B    0.0%     256B
 update nhs                           1    113μs    0.0%   113μs   18.6KiB    0.0%  18.6KiB
 inverse state equation               1   22.5μs    0.0%  22.5μs     0.00B    0.0%    0.00B
 calculate dt                         1    490ns    0.0%   490ns     0.00B    0.0%    0.00B
 update density diffusion             1   30.0ns    0.0%  30.0ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

24 threads main:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 10.6s /  92.4%            647MiB / 100.0%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    5.93s   60.3%   686μs    193MiB   29.8%  22.9KiB
   system interaction             8.65k    4.37s   44.4%   506μs   19.5MiB    3.0%  2.31KiB
     fluid1-fluid1                8.65k    3.67s   37.3%   425μs   4.09MiB    0.6%     496B
     fluid1-boundary2             8.65k    659ms    6.7%  76.2μs   4.09MiB    0.6%     496B
     ~system interaction~         8.65k   42.2ms    0.4%  4.88μs   11.3MiB    1.7%  1.34KiB
     boundary2-fluid1             8.65k    249μs    0.0%  28.8ns     0.00B    0.0%    0.00B
     boundary2-boundary2          8.65k    238μs    0.0%  27.5ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    1.47s   15.0%   170μs    172MiB   26.6%  20.4KiB
     update nhs                   8.65k    765ms    7.8%  88.5μs    164MiB   25.3%  19.4KiB
     compute boundary pressure    8.65k    406ms    4.1%  46.9μs   4.49MiB    0.7%     544B
     ~update systems and nhs~     8.65k    237ms    2.4%  27.4μs   1.45MiB    0.2%     176B
     inverse state equation       8.65k   65.9ms    0.7%  7.63μs   1.98MiB    0.3%     240B
     update density diffusion     8.65k    237μs    0.0%  27.4ns     0.00B    0.0%    0.00B
   reset ∂v/∂t                    8.65k   38.4ms    0.4%  4.44μs     0.00B    0.0%    0.00B
   source terms                   8.65k   32.2ms    0.3%  3.73μs   1.72MiB    0.3%     208B
   ~kick!~                        8.65k   13.8ms    0.1%  1.59μs   2.94KiB    0.0%    0.35B
 save solution                       91    3.81s   38.7%  41.9ms    452MiB   69.9%  4.97MiB
   ~save solution~                   91    3.80s   38.6%  41.8ms    450MiB   69.6%  4.95MiB
   update nhs                        91   4.54ms    0.0%  49.9μs   1.72MiB    0.3%  19.4KiB
   compute boundary pressure         91   4.16ms    0.0%  45.8μs   48.3KiB    0.0%     544B
   inverse state equation            91    668μs    0.0%  7.34μs   21.3KiB    0.0%     240B
   update density diffusion          91   3.90μs    0.0%  42.9ns     0.00B    0.0%    0.00B
 drift!                           8.65k   96.3ms    1.0%  11.1μs   1.58MiB    0.2%     192B
   reset ∂u/∂t                    8.65k   48.6ms    0.5%  5.62μs     0.00B    0.0%    0.00B
   velocity                       8.65k   40.5ms    0.4%  4.69μs   1.58MiB    0.2%     192B
   ~drift!~                       8.65k   7.13ms    0.1%   825ns   1.47KiB    0.0%    0.17B
 update nhs                           1    358μs    0.0%   358μs   18.6KiB    0.0%  18.6KiB
 compute boundary pressure            1   53.4μs    0.0%  53.4μs      544B    0.0%     544B
 inverse state equation               1   11.0μs    0.0%  11.0μs      240B    0.0%     240B
 calculate dt                         1    140ns    0.0%   140ns     0.00B    0.0%    0.00B
 update density diffusion             1   30.0ns    0.0%  30.0ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

24 threads this PR:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 10.6s /  92.3%            731MiB / 100.0%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    5.90s   60.4%   683μs    219MiB   30.0%  26.0KiB
   system interaction             8.65k    4.33s   44.3%   501μs   29.6MiB    4.1%  3.51KiB
     fluid1-fluid1                8.65k    3.63s   37.2%   420μs   4.09MiB    0.6%     496B
     fluid1-boundary2             8.65k    632ms    6.5%  73.1μs   3.69MiB    0.5%     448B
     ~system interaction~         8.65k   63.9ms    0.7%  7.39μs   21.8MiB    3.0%  2.59KiB
     boundary2-boundary2          8.65k    255μs    0.0%  29.4ns     0.00B    0.0%    0.00B
     boundary2-fluid1             8.65k    243μs    0.0%  28.2ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    1.48s   15.2%   172μs    185MiB   25.3%  21.9KiB
     update nhs                   8.65k    777ms    8.0%  89.9μs    166MiB   22.7%  19.7KiB
     compute boundary pressure    8.65k    428ms    4.4%  49.5μs   6.60MiB    0.9%     800B
     ~update systems and nhs~     8.65k    213ms    2.2%  24.7μs   10.0MiB    1.4%  1.19KiB
     inverse state equation       8.65k   65.3ms    0.7%  7.56μs   1.98MiB    0.3%     240B
     update density diffusion     8.65k    256μs    0.0%  29.7ns     0.00B    0.0%    0.00B
   reset ∂v/∂t                    8.65k   47.2ms    0.5%  5.46μs     0.00B    0.0%    0.00B
   source terms                   8.65k   31.7ms    0.3%  3.66μs   5.15MiB    0.7%     624B
   ~kick!~                        8.65k   14.2ms    0.1%  1.64μs   2.94KiB    0.0%    0.35B
 save solution                       91    3.76s   38.5%  41.4ms    508MiB   69.5%  5.58MiB
   ~save solution~                   91    3.73s   38.1%  41.0ms    506MiB   69.2%  5.56MiB
   update nhs                        91   32.8ms    0.3%   360μs   1.74MiB    0.2%  19.6KiB
   compute boundary pressure         91   4.33ms    0.0%  47.6μs   71.1KiB    0.0%     800B
   inverse state equation            91    673μs    0.0%  7.40μs   21.3KiB    0.0%     240B
   update density diffusion          91   7.47μs    0.0%  82.1ns     0.00B    0.0%    0.00B
 drift!                           8.65k    103ms    1.0%  11.9μs   3.83MiB    0.5%     464B
   reset ∂u/∂t                    8.65k   51.7ms    0.5%  5.97μs     0.00B    0.0%    0.00B
   velocity                       8.65k   43.3ms    0.4%  5.01μs   3.83MiB    0.5%     464B
   ~drift!~                       8.65k   7.60ms    0.1%   879ns   1.47KiB    0.0%    0.17B
 update nhs                           1    376μs    0.0%   376μs   18.8KiB    0.0%  18.8KiB
 compute boundary pressure            1   49.9μs    0.0%  49.9μs      800B    0.0%     800B
 inverse state equation               1   10.7μs    0.0%  10.7μs      240B    0.0%     240B
 calculate dt                         1    320ns    0.0%   320ns     0.00B    0.0%    0.00B
 update density diffusion             1    100ns    0.0%   100ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

Apple M1 Pro
6 threads main:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 26.2s /  98.3%            649MiB / 100.0%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    17.9s   69.4%  2.07ms    195MiB   30.0%  23.1KiB
   system interaction             8.65k    15.5s   60.1%  1.79ms   19.5MiB    3.0%  2.31KiB
     fluid1-fluid1                8.65k    13.3s   51.5%  1.53ms   4.10MiB    0.6%     497B
     fluid1-boundary2             8.65k    2.15s    8.4%   249μs   4.09MiB    0.6%     496B
     ~system interaction~         8.65k   46.5ms    0.2%  5.38μs   11.3MiB    1.7%  1.34KiB
     boundary2-fluid1             8.65k    239μs    0.0%  27.6ns     0.00B    0.0%    0.00B
     boundary2-boundary2          8.65k    216μs    0.0%  25.0ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    2.34s    9.1%   271μs    173MiB   26.7%  20.5KiB
     compute boundary pressure    8.65k    1.19s    4.6%   137μs   4.49MiB    0.7%     544B
     update nhs                   8.65k    786ms    3.1%  90.9μs    166MiB   25.5%  19.6KiB
     inverse state equation       8.65k    214ms    0.8%  24.8μs   1.98MiB    0.3%     240B
     ~update systems and nhs~     8.65k    152ms    0.6%  17.6μs   1.45MiB    0.2%     176B
     update density diffusion     8.65k    181μs    0.0%  20.9ns     0.00B    0.0%    0.00B
   source terms                   8.65k   43.7ms    0.2%  5.06μs   1.72MiB    0.3%     208B
   reset ∂v/∂t                    8.65k   12.2ms    0.0%  1.42μs     0.00B    0.0%    0.00B
   ~kick!~                        8.65k   5.24ms    0.0%   606ns   2.94KiB    0.0%    0.35B
 save solution                       91    7.83s   30.4%  86.1ms    453MiB   69.8%  4.97MiB
   ~save solution~                   91    7.81s   30.4%  85.8ms    451MiB   69.5%  4.96MiB
   compute boundary pressure         91   14.0ms    0.1%   154μs   48.3KiB    0.0%     544B
   update nhs                        91   4.81ms    0.0%  52.9μs   1.74MiB    0.3%  19.6KiB
   inverse state equation            91   2.77ms    0.0%  30.4μs   21.3KiB    0.0%     240B
   update density diffusion          91   1.29μs    0.0%  14.2ns     0.00B    0.0%    0.00B
 drift!                           8.65k   33.9ms    0.1%  3.92μs   1.58MiB    0.2%     192B
   velocity                       8.65k   25.6ms    0.1%  2.96μs   1.58MiB    0.2%     192B
   reset ∂u/∂t                    8.65k   5.86ms    0.0%   678ns     0.00B    0.0%    0.00B
   ~drift!~                       8.65k   2.46ms    0.0%   285ns   1.47KiB    0.0%    0.17B
 update nhs                           1   1.12ms    0.0%  1.12ms   18.7KiB    0.0%  18.7KiB
 compute boundary pressure            1   85.5μs    0.0%  85.5μs      544B    0.0%     544B
 inverse state equation               1   32.1μs    0.0%  32.1μs      240B    0.0%     240B
 calculate dt                         1   6.67μs    0.0%  6.67μs     0.00B    0.0%    0.00B
 update density diffusion             1   41.0ns    0.0%  41.0ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

6 threads this PR:

────────────────────────────────────────────────────────────────────────────────────────────────────
Trixi simulation finished.  Final time: 1.8198699419201876  Time steps: 1729 (accepted), 1729 (total)
────────────────────────────────────────────────────────────────────────────────────────────────────

 ──────────────────────────────────────────────────────────────────────────────────────────
            TrixiParticles.jl                     Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 26.9s /  98.3%            733MiB / 100.0%    

 Section                         ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────
 kick!                            8.65k    18.4s   69.4%  2.13ms    221MiB   30.1%  26.2KiB
   system interaction             8.65k    15.7s   59.3%  1.82ms   29.6MiB    4.0%  3.51KiB
     fluid1-fluid1                8.65k    13.1s   49.4%  1.51ms   4.10MiB    0.6%     497B
     fluid1-boundary2             8.65k    2.21s    8.4%   256μs   3.69MiB    0.5%     448B
     ~system interaction~         8.65k    419ms    1.6%  48.5μs   21.8MiB    3.0%  2.59KiB
     boundary2-fluid1             8.65k    333μs    0.0%  38.5ns     0.00B    0.0%    0.00B
     boundary2-boundary2          8.65k    218μs    0.0%  25.2ns     0.00B    0.0%    0.00B
   update systems and nhs         8.65k    2.58s    9.7%   298μs    186MiB   25.4%  22.1KiB
     compute boundary pressure    8.65k    1.28s    4.8%   148μs   6.60MiB    0.9%     800B
     update nhs                   8.65k    878ms    3.3%   102μs    168MiB   22.9%  19.9KiB
     inverse state equation       8.65k    225ms    0.8%  26.0μs   1.98MiB    0.3%     240B
     ~update systems and nhs~     8.65k    197ms    0.7%  22.8μs   10.0MiB    1.4%  1.19KiB
     update density diffusion     8.65k    255μs    0.0%  29.5ns     0.00B    0.0%    0.00B
   source terms                   8.65k   81.2ms    0.3%  9.40μs   5.15MiB    0.7%     624B
   reset ∂v/∂t                    8.65k   12.4ms    0.0%  1.43μs     0.00B    0.0%    0.00B
   ~kick!~                        8.65k   5.77ms    0.0%   668ns   2.94KiB    0.0%    0.35B
 save solution                       91    8.04s   30.3%  88.3ms    508MiB   69.3%  5.59MiB
   ~save solution~                   91    8.01s   30.2%  88.0ms    507MiB   69.1%  5.57MiB
   compute boundary pressure         91   13.2ms    0.0%   145μs   71.1KiB    0.0%     800B
   update nhs                        91   9.53ms    0.0%   105μs   1.76MiB    0.2%  19.8KiB
   inverse state equation            91   2.37ms    0.0%  26.0μs   21.3KiB    0.0%     240B
   update density diffusion          91   4.88μs    0.0%  53.6ns     0.00B    0.0%    0.00B
 drift!                           8.65k   66.1ms    0.2%  7.65μs   3.83MiB    0.5%     464B
   velocity                       8.65k   57.5ms    0.2%  6.65μs   3.83MiB    0.5%     464B
   reset ∂u/∂t                    8.65k   6.03ms    0.0%   698ns     0.00B    0.0%    0.00B
   ~drift!~                       8.65k   2.56ms    0.0%   296ns   1.47KiB    0.0%    0.17B
 update nhs                           1   1.15ms    0.0%  1.15ms   19.0KiB    0.0%  19.0KiB
 compute boundary pressure            1    166μs    0.0%   166μs      800B    0.0%     800B
 inverse state equation               1   30.6μs    0.0%  30.6μs      240B    0.0%     240B
 calculate dt                         1   6.38μs    0.0%  6.38μs     0.00B    0.0%    0.00B
 update density diffusion             1   42.0ns    0.0%  42.0ns     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────────

Copy link

codecov bot commented Mar 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.84%. Comparing base (a3ffe9d) to head (c559d6b).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #454   +/-   ##
=======================================
  Coverage   70.84%   70.84%           
=======================================
  Files          66       66           
  Lines        3807     3807           
=======================================
  Hits         2697     2697           
  Misses       1110     1110           
Flag Coverage Δ
unit 70.84% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ranocha
Copy link
Member

ranocha commented Mar 11, 2024

Note that @batch will also replace arrays by PtrArrays:

julia> using Polyester

julia> function foo!(x)
           @batch for i in eachindex(x)
               if i == 1
                   @show typeof(x)
               end
               x[i] = sin(i)
           end
           return sum(x)
       end
foo! (generic function with 1 method)

julia> foo!(rand(5))
typeof(x) = StrideArraysCore.PtrArray{Float64, 1, (1,), Tuple{Int64}, Tuple{Nothing}, Tuple{Static.StaticInt{1}}}

@svchb
Copy link
Collaborator

svchb commented Mar 11, 2024

The current conclusion of #453 is that its an ARM Julia problem.

@efaulhaber
Copy link
Member Author

So what do we do now? Ignore #453 and this PR because it's only relevant to macOS AMR? I'll try to find a MWE to report it at StrideArrays.jl, but I don't have time for this now.

@svchb
Copy link
Collaborator

svchb commented Mar 12, 2024

Yes I would ignore this for now. We should probably look at this again when 1.11 comes around in a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants