-
Notifications
You must be signed in to change notification settings - Fork 0
/
Quantative_trade_skill.Rmd
350 lines (339 loc) · 17.2 KB
/
Quantative_trade_skill.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
---
title: "How Can Machines Learn to Trade?"
author: "Jerzy Pawlowski, NYU Tandon School of Engineering"
date: "May 19, 2017"
output:
ioslides_presentation:
widescreen: yes
email: [email protected]
affiliation: NYU Tandon School of Engineering
abstract: How Can Machines Learn to Trade?
---
<SCRIPT SRC='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></SCRIPT>
<SCRIPT>MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}})</SCRIPT>
## Backtesting a Machine Learning Model {.smaller}
- The model is trained over the lookback window, and tested out-of-sample on future data.
- The length of the lookback window determines how quickly the model adapts to new information.
- Backtesting allows determining the optimal length of the lookback window.
<img alt="backtesting" src="backtest.png" width="1000" height="300" align="top"/>
## Coin Flipping Model {.smaller}
<img alt="coinflipping" src="coinflipping.jpg" width="800" height="150" align="top"/>
- <a href="http://labs.elmfunds.com/pastreturns/" target="_blank"> Victor Haghani </a> suggested a coin flipping model to illustrate the challenge of properly selecting a manager with skill, based on past performance.
- We can select a manager from several managers, but only one of them has skill, and the remaining are without skill.
- The skilled manager has a slightly greater probability of positive returns than negative ones, while the unskilled managers have a slightly greater probability of negative returns, so that the average performance of all the managers is zero.
- If the probability of positive returns is equal to $p > 0.5$, then the annual Sharpe ratio is equal to $\sqrt{250}*(2p-1)$.
- If the excess annual Sharpe ratio is equal to $0.4$, then the probability of positive returns is equal to $(0.4/\sqrt{250}+1)/2 = 51.2\%$.
## Probability of Selecting a Biased Coin {.smaller}
- We have a set of unbiased coins, except for a single biased one, with a $60\%$ probability of heads.
- We flip the coins simultaneously $n$ times, and select the coin that produces the most heads.
- What is the probability of selecting the biased coin, after flipping the coins simultaneously $n$ times?
<br>
```{r echo=FALSE, fig.width=5.5, fig.height=3.5}
# Calculate the probability of selecting the biased coin, as a function of the number of coin flips, and the number of coins.
confi_dence <- function(num_flips, num_coins, p1, p2=0.5) {
# calculate binomial probabilities for biased coin, using normal approximation for the binomial coefficient.
if (p1^num_flips > 1e-10)
binom_1 <- choose(num_flips, 0:num_flips) * p1^(0:num_flips) * (1-p1)^(num_flips:0)
else
binom_1 <- dnorm(0:num_flips, mean=num_flips*p1, sd=sqrt(num_flips*p1*(1-p1)))
# calculate binomial probabilities for unbiased coins, using normal approximation for the binomial coefficient.
if (p2^num_flips > 1e-10)
binom_2 <- choose(num_flips, 0:num_flips) * p2^(0:num_flips) * (1-p2)^(num_flips:0)
else
binom_2 <- dnorm(0:num_flips, mean=num_flips*p2, sd=sqrt(num_flips*p2*(1-p2)))
# probability of unbiased coin producing less than a certain number of heads
cum_binom_2 <- cumsum(binom_2)
cum_binom_2 <- c(0, cum_binom_2[-NROW(cum_binom_2)])
# probability of selecting the biased coin, when there's a tie in number of heads
prob_tie <- sapply(binom_2, function(pro_b)
sum(choose(num_coins-1, 1:(num_coins-1)) * pro_b^(1:(num_coins-1)) * (1-pro_b)^((num_coins-2):0) / (2:num_coins)))
# total probability of selecting the biased coin, including ties in number of heads
sum(binom_1 * (cum_binom_2^(num_coins-1) + prob_tie))
} # end confi_dence
# Probability of selecting the biased coin out of 2 coins, after 132 coin flips
# confi_dence(132, num_coins=2, 0.6, 0.5)
# Probabilities of selecting the biased coin, as a function of the number of coin flips
num_flips <- 10:150
prob_s <- sapply(num_flips, confi_dence, p1=0.6, num_coins=2)
# Number of coin flips needed to select the biased coin, with 95% confidence
min_num_flips <- num_flips[findInterval(0.95, prob_s)]
# min_num_flips
# Plot probabilities as a function of the number of coin flips
# Create data frame
da_ta <- data.frame(num_flips=num_flips, probs=prob_s)
# Plot with plotly using pipes syntax
suppressMessages(suppressWarnings(library(plotly)))
da_ta %>%
plot_ly(x=~num_flips, y=~probs, type="scatter", mode="lines + markers", name="probability") %>%
add_trace(x=range(num_flips), y=0.95, mode="lines", line=list(color="red"), name="95% confidence") %>%
add_trace(x=min_num_flips, y=range(prob_s), mode="lines", line=list(color="green"), name=paste(min_num_flips, "flips")) %>%
layout(title="Probability of selecting biased coin from two coins",
xaxis=list(title="number of coin flips"),
yaxis=list(title="probability"),
legend=list(x=0.1, y=0.1))
```
## Probability of Selecting a Skilled Manager {.smaller}
- What is the probability of selecting the skilled manager (with an excess Sharpe ratio of $0.4$), from among two managers?
- $33$ years of data are needed to select the manager with skill, at $95\%$ confidence!
<br>
```{r echo=FALSE, fig.width=5.5, fig.height=3.5}
# Sharpe ratio as function of daily probability
# sharpe_ratio <- sqrt(250)*(2*pro_b-1)
# Daily probability as function of Sharpe ratio
sharpe_ratio <- 0.4
pro_b <- (sharpe_ratio/sqrt(250)+1)/2
# Adjust probability to account for two managers
# pro_b <- 0.5 + (pro_b-0.5)/2
# Probability of selecting skilled manager with 20 years of data
# confi_dence(20*250, 2, pro_b, 0.5)
# Annual probabilities of selecting skilled manager from two managers
year_s <- 1:50
prob_s <- sapply(250*year_s, confi_dence, num_coins=2, p1=pro_b, p2=0.5)
# Years of data needed to select the skilled manager, with 95% confidence
num_years <- findInterval(0.95, prob_s)
# Plot probabilities as a function of the number of years
da_ta <- data.frame(years=year_s, probs=prob_s)
# Plot with plotly using pipes syntax
da_ta %>%
plot_ly(x=~years, y=~probs, type="scatter", mode="lines + markers", name="probability") %>%
add_trace(x=range(year_s), y=0.95, mode="lines", line=list(color="red"), name="95% confidence") %>%
add_trace(x=num_years, y=range(prob_s), mode="lines", line=list(color="green"), name=paste(num_years, "years")) %>%
layout(title="Probability of selecting skilled manager",
xaxis=list(title="years"),
yaxis=list(title="probabilities"),
legend=list(x=0.1, y=0.1))
```
## Selecting From Among Multiple Managers {.smaller}
- In reality we must select from among multiple managers, any one of whom may out-perform purely by chance.
<br>
```{r echo=FALSE, fig.width=5.5, fig.height=3.5}
# Probabilities of selecting the skilled manager, as a function of the number of managers
num_managers <- 2:50
prob_s <- sapply(num_managers, confi_dence, p1=pro_b, p2=0.5, num_flips=250*33)
# prob_s <- cbind(num_managers, prob_s)
# plot(prob_s, t="l", main="Probabilities of selecting the skilled manager, as a function of the number of managers")
# Create data frame
da_ta <- data.frame(num_managers=num_managers, probs=prob_s)
# Plot with plotly using pipes syntax
da_ta %>%
plot_ly(x=~num_managers, y=~probs, type="scatter", mode="lines + markers", name="probability") %>%
layout(title="Probability of selecting skilled manager, from multiple managers",
xaxis=list(title="number of managers"),
yaxis=list(title="probability"),
legend=list(x=0.1, y=0.1))
```
## Dynamic Investing With Multiple Managers {.smaller}
- Dynamic strategy: at the end of each period, we switch to the best performing manager.
<br>
<img alt="backtesting" src="backtest.png" width="1000" height="300" align="top"/>
## Effect of Number of Managers {.smaller}
- A greater number of managers decreases the out-of-sample strategy performance.
<br>
```{r echo=FALSE, fig.width=5.5, fig.height=3.5}
# cum_pnl for multi-manager strategy (simplest version)
cum_pnl <- function(look_back, n_row, sharpe_ratio=NULL, re_turns=NULL, mean_s=NULL, num_managers=NULL, vol_at=0.01) {
# calculate drifts
if(is.null(mean_s)) {
pro_b <- (sharpe_ratio/sqrt(250)+1)/2
# Adjust probability to account for multiple managers
p1 <- (0.5*num_managers + (pro_b - 0.5)*(num_managers-1)) / num_managers
p2 <- (0.5*num_managers - (pro_b - 0.5)) / num_managers
mean_s <- vol_at*look_back*c(2*p1-1, rep(2*p2-1, num_managers-1))
} else {
num_managers <- NROW(mean_s)
} # end if
# calculate probability of selecting the best manager
pro_b <- integrate(function(x, ...)
dnorm(x, mean=mean_s[1], ...)*pnorm(x, mean=mean_s[2], ...)^(num_managers-1),
low=-3.0, up=3.0,
sd=sqrt(look_back)*vol_at)$value
# return total expected pnl
num_agg <- n_row %/% look_back
num_agg*(pro_b*mean_s[1] + (1-pro_b)*mean_s[2])
} # end cum_pnl
# Calculate total expected pnl
# cum_pnl(look_back=100, sharpe_ratio=0.4, num_managers=11, n_row=5000)
# Perform loop over number of managers
num_managers <- 2*(1:50)
pnl_s <- sapply(num_managers, cum_pnl,
re_turns=NULL, sharpe_ratio=0.4, look_back=100, n_row=50000, mean_s=NULL, vol_at=0.01)
# pnl_s <- cbind(num_managers, pnl_s)
# plot(pnl_s, t="l", main="Strategy pnl as a function of number of managers")
# Create data frame
da_ta <- data.frame(num_managers=num_managers, pnl_s=pnl_s)
# Plot with plotly using pipes syntax
da_ta %>%
plot_ly(x=~num_managers, y=~pnl_s, type="scatter", mode="lines + markers", name="probability") %>%
layout(title="Strategy pnl as function of number of managers",
xaxis=list(title="number of managers"),
yaxis=list(title="strategy pnl"),
legend=list(x=0.1, y=0.1))
```
## Effect of Lookback Window Length {.smaller}
- A longer lookback window increases the out-of-sample strategy performance.
<br>
```{r eval=TRUE, echo=FALSE, fig.width=5.5, fig.height=3.5}
# Perform loop over look-back windows
look_backs <- 100*(1:20)
pnl_s <- sapply(look_backs, cum_pnl,
sharpe_ratio=0.4, num_managers=11, n_row=50000)
# pnl_s <- cbind(look_backs, pnl_s)
# plot(pnl_s, t="l", main="Strategy pnl as a function of lookback window length")
# Create data frame
da_ta <- data.frame(look_backs=look_backs, pnl_s=pnl_s)
# Plot with plotly using pipes syntax
da_ta %>%
plot_ly(x=~look_backs, y=~pnl_s, type="scatter", mode="lines + markers", name="probability") %>%
layout(title="Strategy pnl as function of lookback window length",
xaxis=list(title="window length"),
yaxis=list(title="strategy pnl"),
legend=list(x=0.1, y=0.1))
```
## Simulating Managers with Time-dependent Skill {.smaller}
```{r eval=TRUE, echo=FALSE, fig.width=10, fig.height=6}
suppressMessages(suppressWarnings(library(HighFreq)))
# define daily volatility: daily prices change by vol_at units
vol_at <- 0.01
n_row <- 5000
num_managers <- 3
# rate of drift (skill) change
ra_te <- 4*pi
# Daily probability as function of Sharpe ratio
sharpe_ratio <- 0.4
pro_b <- (sharpe_ratio/sqrt(250)+1)/2
# Adjust probability to account for two managers
pro_b <- 0.5 + (pro_b-0.5)/2
# define growth rate
mea_n <- vol_at*(2*pro_b-1)
# time-dependent drift (skill)
dri_ft <- sapply(1:num_managers, function(x)
mea_n*sin(ra_te*(1:n_row)/n_row + 2*pi*x/num_managers))
# simulate multiple price paths
set.seed(1121) # reset random number generator
re_turns <- matrix(vol_at*rnorm(num_managers*n_row) - vol_at^2/2, nc=num_managers) + dri_ft
col_ors <- colorRampPalette(c("red", "blue"))(NCOL(re_turns))
par(mfrow=c(2, 2))
par(mar=c(3, 1, 1, 1), oma=c(1, 1, 1, 1))
plot.zoo(dri_ft, main="time-dependent growth rates", lwd=3, xlab=NA, ylab=NA, plot.type="single", col=col_ors)
plot.zoo(re_turns, main="simulated returns", xlab=NA, ylab=NA, plot.type="single", col=col_ors)
plot.zoo(apply(re_turns, 2, cumsum),
main="simulated prices", xlab=NA, ylab=NA, plot.type="single", col=col_ors)
```
## Trend-following: Select Best Manager From Previous Period {.smaller}
<img alt="trend_following" src="trend_following.png" width="750" height="500" align="top"/>
```{r echo=FALSE, fig.width=7, fig.height=5}
num_managers <- 5
dri_ft <- sapply(1:num_managers, function(x)
mea_n*sin(ra_te*(1:n_row)/n_row + 2*pi*x/num_managers))
set.seed(1121) # reset random number generator
re_turns <- matrix(vol_at*rnorm(num_managers*n_row) - vol_at^2/2, nc=num_managers) + dri_ft
# calculate cumulative returns
cum_rets <- apply(re_turns, 2, cumsum)
### pre-calculate row order indices for a vector of look_backs
look_backs <- 20*(1:50)
order_stats <- lapply(look_backs, function(look_back) {
# total re_turns aggregated over overlapping windows
agg_rets <- apply(cum_rets, 2, rutils::diff_it, lag=look_back)
or_der <- t(apply(agg_rets, 1, order))
or_der <- rutils::lag_it(or_der)
or_der[1, ] <- 1
or_der
}) # end lapply
names(order_stats) <- look_backs
### cum_pnl for long-short multi-manager strategy (without end_points)
cum_pnl <- function(select_best=NULL, select_worst=NULL, re_turns, or_der) {
n_row <- NROW(re_turns)
if(!is.null(select_best)) {
n_col <- NCOL(re_turns)
be_st <- or_der[, (n_col-select_best+1):n_col]
be_st <- cbind(1:n_row, be_st)
} else {
be_st <- NULL
} # end if
if(!is.null(select_worst)) {
wor_st <- or_der[, 1:select_worst]
wor_st <- cbind(1:n_row, wor_st)
} else {
wor_st <- NULL
} # end if
# return total expected pnl
# pnl_s <- re_turns[be_st]-re_turns[wor_st]
sum(re_turns[be_st])-sum(re_turns[wor_st])
} # end cum_pnl
# calculate pnl for long-short multi-manager strategy
# cum_pnl(select_best=1, select_worst=1, re_turns=re_turns, or_der=order_stats[[5]])
# perform loop over look-back windows
pnl_s <- sapply(order_stats, cum_pnl, select_best=1, select_worst=NULL, re_turns=re_turns)
pnl_s <- cbind(look_backs, pnl_s)
# par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
# plot(pnl_s, t="l", main="Trend-following PnL, as function of look-back window")
### double the dri_ft
set.seed(1121) # reset random number generator
re_turns <- matrix(vol_at*rnorm(num_managers*n_row) - vol_at^2/2, nc=num_managers) + 2*dri_ft
# calculate cumulative returns
cum_rets <- apply(re_turns, 2, cumsum)
### pre-calculate row order indices for a vector of look_backs
order_stats_2x <- lapply(look_backs, function(look_back) {
# total re_turns aggregated over overlapping windows
agg_rets <- apply(cum_rets, 2, rutils::diff_it, lag=look_back)
or_der <- t(apply(agg_rets, 1, order))
or_der <- rutils::lag_it(or_der)
or_der[1, ] <- 1
or_der
}) # end lapply
names(order_stats_2x) <- look_backs
# perform loop over look-back windows
pnls_2x <- sapply(order_stats_2x, cum_pnl, select_best=1, select_worst=NULL, re_turns=re_turns)
par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
plot.zoo(cbind(pnl_s[, 2], pnls_2x), main="Trend-following PnL, as function of look-back window",
lwd=2, xaxt="n", xlab="look-back windows", ylab="PnL", plot.type="single", col=c("black", "red"))
# add x-axis
axis(1, seq_along(look_backs), look_backs)
# add legend
legend(x="top", legend=paste0("SR=", c(0.4, 0.8)),
inset=0.0, cex=0.8, bg="white",
lwd=6, lty=c(1, 1), col=c("black", "red"))
```
## Ensemble: Select Top Two Managers From Previous Period {.smaller}
<img alt="ensemble" src="ensemble.png" width="750" height="500" align="top"/>
```{r echo=FALSE, fig.width=7, fig.height=5}
# perform loop over look-back windows
pnl_s <- sapply(order_stats, cum_pnl, select_best=2, select_worst=NULL, re_turns=re_turns)
pnl_s <- cbind(look_backs, pnl_s)
# par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
# plot(pnl_s, t="l", main="Trend-following PnL, as function of look-back window")
# perform loop over look-back windows
pnls_2x <- sapply(order_stats_2x, cum_pnl, select_best=2, select_worst=NULL, re_turns=re_turns)
par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
plot.zoo(cbind(pnl_s[, 2], pnls_2x), main="Trend-following PnL, as function of look-back window",
lwd=2, xaxt="n", xlab="look-back windows", ylab="PnL", plot.type="single", col=c("black", "red"))
# add x-axis
axis(1, seq_along(look_backs), look_backs)
# add legend
legend(x="top", legend=paste0("SR=", c(0.4, 0.8)),
inset=0.0, cex=0.8, bg="white",
lwd=6, lty=c(1, 1), col=c("black", "red"))
```
## Long-short Ensemble: Long Top Manager and Short Bottom Manager {.smaller}
<img alt="ensemble" src="long_short.png" width="750" height="500" align="top"/>
```{r echo=FALSE, fig.width=7, fig.height=5}
# perform loop over look-back windows
pnl_s <- sapply(order_stats, cum_pnl, select_best=1, select_worst=1, re_turns=re_turns)
pnl_s <- cbind(look_backs, pnl_s)
# par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
# plot(pnl_s, t="l", main="Trend-following PnL, as function of look-back window")
# perform loop over look-back windows
pnls_2x <- sapply(order_stats_2x, cum_pnl, select_best=1, select_worst=1, re_turns=re_turns)
par(mar=c(1, 1, 1, 1), oma=c(1, 1, 1, 1))
plot.zoo(cbind(pnl_s[, 2], pnls_2x), main="Trend-following PnL, as function of look-back window",
lwd=2, xaxt="n", xlab="look-back windows", ylab="PnL", plot.type="single", col=c("black", "red"))
# add x-axis
axis(1, seq_along(look_backs), look_backs)
# add legend
legend(x="top", legend=paste0("SR=", c(0.4, 0.8)),
inset=0.0, cex=0.8, bg="white",
lwd=6, lty=c(1, 1), col=c("black", "red"))
```
## Thank You {.smaller}
- Slide source is available here:
\href{https://github.com/algoquant/presentations/blob/master/RFinance_2017.Rmd}}