Skip to content

More accurate benchmarks...#34

Merged
lemire merged 3 commits into
masterfrom
more_accurate_bm
Apr 23, 2025
Merged

More accurate benchmarks...#34
lemire merged 3 commits into
masterfrom
more_accurate_bm

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Apr 22, 2025

I am concerned that our current benchmarks are not as accurate as they could be, so I reworked them in this PR. For example, I am getting that the 'just_string' routine can be slower than dragonbox, which is not (I think) possible.

Let me run through the results for the canada.txt file.

Results with GCC 12 on an Intel Ice Lake processor:

$ ./build/benchmarks/benchmark -f data/canada.txt
number type: binary64 (double)
# read 111126 lines
just_string                   :  1365.92 MB/s (+/- 1.4 %)     2.09 MB     13.76 ns/f    72.66 Mfloat/s
                                    9.95 i/B   187.15 i/f (+/- -0.0 %)      2.33 c/B    43.85 c/f (+/- 1.5 %)
                                    4.27 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
dragon4                       :    57.89 MB/s (+/- 0.3 %)     1.87 MB    290.21 ns/f     3.45 Mfloat/s
                                  254.84 i/B  4281.18 i/f (+/- -0.0 %)     55.09 c/B   925.53 c/f (+/- 0.2 %)
                                    4.63 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
# skipping errol3
std::to_string                :    40.95 MB/s (+/- 0.2 %)     1.07 MB    235.51 ns/f     4.25 Mfloat/s
                                  255.46 i/B  2463.58 i/f (+/- -0.0 %)     77.88 c/B   751.01 c/f (+/- 0.1 %)
                                    3.28 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
fmt::format                   :   171.07 MB/s (+/- 0.3 %)     1.87 MB     98.20 ns/f    10.18 Mfloat/s
                                   69.76 i/B  1172.00 i/f (+/- 0.0 %)     18.64 c/B   313.18 c/f (+/- 0.2 %)
                                    3.74 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
netlib                        :    11.66 MB/s (+/- 0.1 %)     1.87 MB   1440.88 ns/f     0.69 Mfloat/s
                                  675.88 i/B 11354.68 i/f (+/- 0.0 %)    273.41 c/B  4593.24 c/f (+/- 0.1 %)
                                    2.47 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
snprintf                      :    54.15 MB/s (+/- 0.3 %)     2.03 MB    336.98 ns/f     2.97 Mfloat/s
                                  191.80 i/B  3499.65 i/f (+/- -0.0 %)     58.89 c/B  1074.58 c/f (+/- 0.2 %)
                                    3.26 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
grisu2                        :   665.33 MB/s (+/- 0.7 %)     1.87 MB     25.25 ns/f    39.60 Mfloat/s
                                   16.85 i/B   283.09 i/f (+/- -0.0 %)      4.79 c/B    80.53 c/f (+/- 0.5 %)
                                    3.52 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
grisu_exact                   :   561.19 MB/s (+/- 2.0 %)     2.09 MB     33.50 ns/f    29.85 Mfloat/s
                                   18.95 i/B   356.26 i/f (+/- 0.0 %)      5.68 c/B   106.87 c/f (+/- 1.9 %)
                                    3.33 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
schubfach                     :   534.89 MB/s (+/- 0.8 %)     1.87 MB     31.41 ns/f    31.84 Mfloat/s
                                   17.30 i/B   290.70 i/f (+/- 0.0 %)      5.96 c/B   100.10 c/f (+/- 0.8 %)
                                    2.90 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
dragonbox                     :   922.67 MB/s (+/- 1.3 %)     2.09 MB     20.38 ns/f    49.08 Mfloat/s
                                   10.90 i/B   204.85 i/f (+/- -0.0 %)      3.46 c/B    64.96 c/f (+/- 1.2 %)
                                    3.15 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
ryu                           :   557.85 MB/s (+/- 1.6 %)     2.09 MB     33.70 ns/f    29.67 Mfloat/s
                                   21.69 i/B   407.70 i/f (+/- 0.0 %)      5.72 c/B   107.51 c/f (+/- 1.5 %)
                                    3.79 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
teju_jagua                    :   560.87 MB/s (+/- 1.2 %)     2.09 MB     33.52 ns/f    29.83 Mfloat/s
                                   16.32 i/B   306.82 i/f (+/- -0.0 %)      5.68 c/B   106.84 c/f (+/- 1.2 %)
                                    2.87 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
double_conversion             :   202.41 MB/s (+/- 0.2 %)     1.87 MB     83.00 ns/f    12.05 Mfloat/s
                                   49.48 i/B   831.21 i/f (+/- 0.0 %)     15.75 c/B   264.68 c/f (+/- 0.2 %)
                                    3.14 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
abseil                        :   215.51 MB/s (+/- 0.7 %)     2.03 MB     84.67 ns/f    11.81 Mfloat/s
                                   50.92 i/B   929.18 i/f (+/- -0.0 %)     14.80 c/B   270.00 c/f (+/- 0.7 %)
                                    3.44 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
std::to_chars                 :   335.60 MB/s (+/- 0.3 %)     1.87 MB     50.06 ns/f    19.98 Mfloat/s
                                   37.95 i/B   637.54 i/f (+/- -0.0 %)      9.50 c/B   159.64 c/f (+/- 0.3 %)
                                    3.99 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
grisu3                        :   363.16 MB/s (+/- 0.6 %)     1.87 MB     46.26 ns/f    21.62 Mfloat/s
                                   26.45 i/B   444.32 i/f (+/- -0.0 %)      8.78 c/B   147.51 c/f (+/- 0.5 %)
                                    3.01 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
SwiftDtoa                     :   426.81 MB/s (+/- 0.9 %)     1.87 MB     39.36 ns/f    25.40 Mfloat/s
                                   23.28 i/B   391.16 i/f (+/- 0.0 %)      7.47 c/B   125.55 c/f (+/- 0.8 %)
                                    3.12 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz
yy_double                     :   732.45 MB/s (+/- 3.1 %)     1.87 MB     22.94 ns/f    43.60 Mfloat/s
                                   14.11 i/B   237.00 i/f (+/- 0.0 %)      4.36 c/B    73.17 c/f (+/- 3.0 %)
                                    3.24 i/c      0.00 b/f                 0.00 bm/f      3.19 GHz

Results with LLVM 15 on an Apple M2 processor:

./build/benchmarks/benchmark -f data/canada.txt
number type: binary64 (double)
# read 111126 lines
just_string                   :  2553.73 MB/s (+/- 11.2 %)     2.09 MB      7.36 ns/f   135.84 Mfloat/s
                                    8.25 i/B   155.11 i/f (+/- 0.1 %)      1.37 c/B    25.84 c/f (+/- 1.7 %)
                                    6.00 i/c     16.72 b/f                 0.23 bm/f      3.51 GHz 
dragon4                       :    89.61 MB/s (+/- 1.4 %)     1.87 MB    187.47 ns/f     5.33 Mfloat/s
                                  187.15 i/B  3144.12 i/f (+/- -0.0 %)     38.12 c/B   640.42 c/f (+/- 0.1 %)
                                    4.91 i/c    567.84 b/f                 3.10 bm/f      3.42 GHz 
# skipping errol3
std::to_string                :    72.97 MB/s (+/- 2.6 %)     1.07 MB    132.15 ns/f     7.57 Mfloat/s
                                  164.40 i/B  1585.40 i/f (+/- 0.0 %)     47.39 c/B   457.05 c/f (+/- 0.2 %)
                                    3.47 i/c    295.55 b/f                 1.01 bm/f      3.46 GHz 
fmt::format                   :   380.87 MB/s (+/- 6.4 %)     1.87 MB     44.11 ns/f    22.67 Mfloat/s
                                   43.34 i/B   728.17 i/f (+/- 0.1 %)      8.92 c/B   149.82 c/f (+/- 1.4 %)
                                    4.86 i/c    119.22 b/f                 0.33 bm/f      3.40 GHz 
netlib                        :    17.49 MB/s (+/- 0.7 %)     1.87 MB    960.59 ns/f     1.04 Mfloat/s
                                  755.91 i/B 12699.07 i/f (+/- 0.0 %)    193.15 c/B  3244.86 c/f (+/- 0.4 %)
                                    3.91 i/c   1863.77 b/f                 5.65 bm/f      3.38 GHz 
snprintf                      :    67.33 MB/s (+/- 1.0 %)     2.03 MB    271.01 ns/f     3.69 Mfloat/s
                                  213.67 i/B  3898.73 i/f (+/- -0.0 %)     50.48 c/B   921.02 c/f (+/- 0.2 %)
                                    4.23 i/c    698.16 b/f                 4.23 bm/f      3.40 GHz 
grisu2                        :   979.19 MB/s (+/- 4.2 %)     1.87 MB     17.16 ns/f    58.28 Mfloat/s
                                   17.55 i/B   294.81 i/f (+/- 0.0 %)      3.59 c/B    60.24 c/f (+/- 0.3 %)
                                    4.89 i/c     30.60 b/f                 0.23 bm/f      3.51 GHz 
grisu_exact                   :   919.14 MB/s (+/- 4.0 %)     2.09 MB     20.45 ns/f    48.89 Mfloat/s
                                   16.51 i/B   310.47 i/f (+/- -0.0 %)      3.81 c/B    71.71 c/f (+/- 0.1 %)
                                    4.33 i/c     27.23 b/f                 0.44 bm/f      3.51 GHz 
schubfach                     :   958.07 MB/s (+/- 3.6 %)     1.87 MB     17.53 ns/f    57.03 Mfloat/s
                                   16.57 i/B   278.32 i/f (+/- -0.0 %)      3.64 c/B    61.08 c/f (+/- 0.6 %)
                                    4.56 i/c     33.72 b/f                 0.27 bm/f      3.48 GHz 
dragonbox                     :  1451.56 MB/s (+/- 3.7 %)     2.09 MB     12.95 ns/f    77.21 Mfloat/s
                                   10.97 i/B   206.27 i/f (+/- 0.0 %)      2.39 c/B    45.02 c/f (+/- 0.8 %)
                                    4.58 i/c     26.49 b/f                 0.29 bm/f      3.48 GHz 
ryu                           :  1199.35 MB/s (+/- 4.1 %)     2.09 MB     15.68 ns/f    63.80 Mfloat/s
                                   15.51 i/B   291.56 i/f (+/- 0.0 %)      2.92 c/B    54.90 c/f (+/- 0.5 %)
                                    5.31 i/c     33.33 b/f                 0.27 bm/f      3.50 GHz 
teju_jagua                    :   910.23 MB/s (+/- 4.4 %)     2.09 MB     20.65 ns/f    48.42 Mfloat/s
                                   13.18 i/B   247.73 i/f (+/- -0.0 %)      3.85 c/B    72.30 c/f (+/- 0.4 %)
                                    3.43 i/c     28.29 b/f                 1.06 bm/f      3.50 GHz 
double_conversion             :   287.41 MB/s (+/- 1.8 %)     1.87 MB     58.45 ns/f    17.11 Mfloat/s
                                   52.27 i/B   878.08 i/f (+/- 0.0 %)     12.01 c/B   201.72 c/f (+/- -0.3 %)
                                    4.35 i/c    116.90 b/f                 1.02 bm/f      3.45 GHz 
abseil                        :   286.61 MB/s (+/- 1.0 %)     2.03 MB     63.66 ns/f    15.71 Mfloat/s
                                   50.37 i/B   919.17 i/f (+/- 0.0 %)     11.87 c/B   216.51 c/f (+/- 0.5 %)
                                    4.25 i/c    169.39 b/f                 0.66 bm/f      3.40 GHz 
std::to_chars                 :   814.24 MB/s (+/- 3.5 %)     1.87 MB     20.63 ns/f    48.47 Mfloat/s
                                   24.22 i/B   406.90 i/f (+/- -0.0 %)      4.28 c/B    71.94 c/f (+/- 0.2 %)
                                    5.66 i/c     64.73 b/f                 0.27 bm/f      3.49 GHz 
grisu3                        :   481.93 MB/s (+/- 3.7 %)     1.87 MB     34.86 ns/f    28.69 Mfloat/s
                                   25.12 i/B   422.01 i/f (+/- -0.0 %)      7.24 c/B   121.64 c/f (+/- 0.0 %)
                                    3.47 i/c     45.66 b/f                 1.02 bm/f      3.49 GHz 
SwiftDtoa                     :   770.17 MB/s (+/- 4.1 %)     1.87 MB     21.81 ns/f    45.84 Mfloat/s
                                   20.02 i/B   336.39 i/f (+/- -0.0 %)      4.53 c/B    76.14 c/f (+/- 0.7 %)
                                    4.42 i/c     41.43 b/f                 0.52 bm/f      3.49 GHz 
yy_double                     :  1550.05 MB/s (+/- 4.3 %)     1.87 MB     10.84 ns/f    92.26 Mfloat/s
                                   11.14 i/B   187.08 i/f (+/- 0.0 %)      2.26 c/B    38.01 c/f (+/- 0.6 %)
                                    4.92 i/c     12.03 b/f                 0.00 bm/f      3.51 GHz 

@lemire lemire requested a review from jaja360 April 22, 2025 01:09
Copy link
Copy Markdown
Collaborator

@jaja360 jaja360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good !

@lemire lemire merged commit 59f508c into master Apr 23, 2025
8 checks passed
@jaja360 jaja360 deleted the more_accurate_bm branch July 21, 2025 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants