-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathbenchmarks.qmd
153 lines (108 loc) · 7.65 KB
/
benchmarks.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Benchmarks"
toc: true
---
```{julia}
#| echo: false
#| output: false
using Pkg
Pkg.activate("env/benchmarks")
Pkg.instantiate()
```
## The Life Modeling Problem
Inspired by the discussion in the [ActuarialOpenSource](https://github.com/actuarialopensource) GitHub community discussion, folks started submitted solutions to what someone referred to as the "Life Modeling Problem". This user [submitted a short snippet](https://github.com/orgs/actuarialopensource/teams/common-room/discussions/5) for consideration of a representative problem.
### Benchmarks
After the original user submitted a proposal, others chimed in and submitted versions in their favorite languages. I have collected those versions, and run them on a consistent set of hardware.
Some submissions were excluded because from the benchmarks they involved an entirely different approach, such as [memoizing](https://en.wikipedia.org/wiki/Memoization) the function calls[^1].
```{julia}
#| echo: false
#| label: tbl-life-modeling-benchmark
#| tbl-cap: Benchmarks for the Life Modeling Problem in nanoseconds (lower times are better).
#|
using CSV, DataFrames
using PrettyTables
file = download("https://raw.githubusercontent.com/JuliaActuary/Learn/master/Benchmarks/LifeModelingProblem/benchmarks.csv")
benchmarks = CSV.read(file, DataFrame)
benchmarks.relative_mean = benchmarks.mean ./ minimum(benchmarks.mean)
sort(benchmarks, :relative_mean)
```
To aid in visualizing results with such vast different orders of magnitude, this graph includes a physical length comparison to serve as a reference. The computation time is represented by the distance that light travels in the time for the computation to complete (comparing a nanosecond to one foot length [goes at least back to Admiral Grace Hopper](https://www.youtube.com/watch?v=9eyFDBPk4Yw)).
```{julia}
#| echo: false
#| label: fig-life-modeling-benchmark
using Plots
using DataFrames
p = plot(palette=:seaborn_colorblind, rotation=25, yaxis=:log)
# label equivalents to distance to make log scale more relatable
scatter!(
fill("\n equivalents (ns → ft)", 7),
[1, 1e1, 1e2, 1e3, .8e4, 0.72e5, 3.3e6],
series_annotations=Plots.text.(["1 foot", "basketball hoop", "blue whale", "Eiffel Tower", "avg ocean depth", "marathon distance", "Space Station altitude"], :left, 8, :grey),
marker=0,
label="",
left_margin=20Plots.mm,
bottom_margin=20Plots.mm
)
# plot mean, or median if not available
for g in groupby(benchmarks, :algorithm)
scatter!(p, g.lang,
ifelse.(ismissing.(g.mean), g.median, g.mean),
label="$(g.algorithm[1])",
ylabel="Nanoseconds (log scale)",
marker=(:circle, 5, 0.7, stroke(0)))
end
p
```
### Discussion
For more a more in-depth discussion of these results, see [this post](/posts/life-modeling-problem/).
All of the benchmarked code can be found in the [JuliaActuary Learn repository](https://github.com/JuliaActuary/Learn/tree/master/Benchmarks/LifeModelingProblem). Please file an issue or submit a PR request there for issues/suggestions.
## IRRs
**Task:** determine the IRR for a series of cashflows 701 elements long (e.g. monthly cashflows for 60 years).
### Benchmarks
:::{.column-body-outset}
```plaintext
Times are in nanoseconds:
┌──────────┬──────────────────┬───────────────────┬─────────┬─────────────┬───────────────┐
│ Language │ Package │ Function │ Median │ Mean │ Relative Mean │
├──────────┼──────────────────┼───────────────────┼─────────┼─────────────┼───────────────┤
│ Python │ numpy_financial │ irr │ missing │ 519306422 │ 123146x │
│ Python │ better │ irr_binary_search │ missing │ 3045229 │ 722x │
│ Python │ better │ irr_newton │ missing │ 382166 │ 91x │
│ Julia │ ActuaryUtilities │ irr │ 4185 │ 4217 │ 1x │
└──────────┴──────────────────┴───────────────────┴─────────┴─────────────┴───────────────┘
```
:::
### Discussion
The ActuaryUtilities implementation is over 100,000 times faster than `numpy_financial`, and 91 to 722 times faster than the `better` Python package. The [ActuaryUtilities.jl](/#actuaryutilitiesjl) implementation is also more flexible, as it can be given an argument with timepoints, similar to Excel's `XIRR`.
Excel was used to attempt a benchmark, but the `IRR` formula returned a `#DIV/0!` error.
All of the benchmarked code can be found in the [JuliaActuary Learn repository](https://github.com/JuliaActuary/Learn/tree/master/Benchmarks/irr). Please file an issue or submit a PR request there for issues/suggestions.
## Black-Scholes-Merton European Option Pricing
**Task:** calculate the price of a vanilla european call option using the Black-Scholes-Merton formula.
\begin{align}
C(S_t, t) &= N(d_1)S_t - N(d_2)Ke^{-r(T - t)} \\
d_1 &= \frac{1}{\sigma\sqrt{T - t}}\left[\ln\left(\frac{S_t}{K}\right) + \left(r + \frac{\sigma^2}{2}\right)(T - t)\right] \\
d_2 &= d_1 - \sigma\sqrt{T - t}
\end{align}
### Benchmarks
```plaintext
Times are in nanoseconds:
┌──────────┬─────────┬─────────────┬───────────────┐
│ Language │ Median │ Mean │ Relative Mean │
├──────────┼─────────┼─────────────┼───────────────┤
│ Python │ missing │ 817000.0 │ 19926.0 │
│ R │ 3649.0 │ 3855.2 │ 92.7 │
│ Julia │ 41.0 │ 41.6 │ 1.0 │
└──────────┴─────────┴─────────────┴───────────────┘
```
### Discussion
Julia is nearly 20,000 times faster than Python, and two orders of magnitude faster than R.
## Other benchmarks
These benchmarks have been performed by others, but provide relevant information for actuarial-related work:
- [H2Oai DataFrames/Database-like Operations](https://h2oai.github.io/db-benchmark/)
- [Reading CSVs](https://juliacomputing.com/blog/2020/06/fast-csv/)
## Colophone
### Code
All of the benchmarked code can be found in the [JuliaActuary Learn repository](https://github.com/JuliaActuary/Learn/tree/master/Benchmarks/). Please file an issue or submit a PR request there for issues/suggestions.
## Footnotes
[^1] If benchmarking memoization, it's essentially benchmarking how long it takes to perform hashing in a language. While interesting, especially in the context of [incremental computing](https://scattered-thoughts.net/writing/an-opinionated-map-of-incremental-and-streaming-systems), it's not the core issue at hand. Incremental computing libraries exist for all of the modern languages discussed here.
[^2] Note that not all languages have both a mean and median result in their benchmarking libraries. Mean is a better representation for a garbage-collected modern language, because sometimes the computation just takes longer than the median result. Where the mean is not available in the graph below, median is substituted.