-
Notifications
You must be signed in to change notification settings - Fork 36
/
betablockers.Rmd
232 lines (178 loc) · 9.94 KB
/
betablockers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: "Beta blocker cross-validation demo"
author: "[Aki Vehtari](https://users.aalto.fi/~ave/)"
date: "First version 2018-01-10. Last modified `r format(Sys.Date())`."
output:
html_document:
fig_caption: yes
toc: TRUE
toc_depth: 2
number_sections: TRUE
toc_float:
smooth_scroll: FALSE
bibliography: modelsel.bib
csl: harvard-cite-them-right.csl
link-citations: yes
---
# Setup {.unnumbered}
```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=FALSE, message=FALSE, error=FALSE, warning=FALSE, comment=NA, out.width='95%')
```
**Load packages**
```{r}
library(tidyr)
library(rstanarm)
library(loo)
library(ggplot2)
theme_set(bayesplot::theme_default())
library(ggridges)
library(bridgesampling)
```
# Introduction
This notebook demonstrates a simple model we trust (no model
misspecification). In this case, cross-validation is not needed, and we
we can get better accuracy using the explicit model.
# Comparison of two groups with Binomial
An experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients [the example is from @BDA3, Ch 3]. A group of patients were randomly assigned to treatment and control groups:
- out of 674 patients receiving the control, 39 died
- out of 680 receiving the treatment, 22 died
Data, where `grp2` is a dummy variable that captures the difference of
the intercepts in the first and the second group.
```{r}
d_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = c(0,1))
```
## Analysis of the observed data
To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio.
```{r,warning=FALSE,error=FALSE}
fit_bin2 <- stan_glm(y/N ~ grp2, family = binomial(), data = d_bin2,
weights = N, refresh=0)
```
In general we recommend showing the full posterior of the quantity of
interest, which in this case is the odds ratio.
```{r}
samples_bin2 <- rstan::extract(fit_bin2$stanfit)
theta1 <- plogis(samples_bin2$alpha)
theta2 <- plogis(samples_bin2$alpha + samples_bin2$beta)
oddsratio <- (theta2/(1-theta2))/(theta1/(1-theta1))
ggplot() + geom_histogram(aes(oddsratio), bins = 50, fill = 'grey', color = 'darkgrey') +
labs(y = '') + scale_y_continuous(breaks = NULL)
```
The probability that odds-ratio is less than 1:
```{r}
print(mean(oddsratio<1),2)
```
This posterior distribution of the odds-ratio (or some transformation
of it) is the simplest and the most accurate way to analyse the
effectiveness of the treatment. In this case, there is high probability
that the treatment is effective and relatively big.
## Simulation experiment
Although we recommend showing the full posterior, the probability that oddsratio < 1 can be a useful summary.
Simulation experiment `binom_odds_comparison.R` runs 100 simulations with simulated data with varying oddsratio (0.1,...,1.0) and computes for each run the probability that oddsratio<1. The following figures show the variation in the results.
Variation in probability that oddsratio<1 when true oddsratio is varied.
```{r}
load(file="binom_test_densities.RData")
ggplot(betaprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```
# Cross-validation
Sometimes it is better to focus on observable space (we can't observe
$\theta$ or odds-ratio directly, but we can observe $y$). In
leave-one-out cross-validation, model is fitted $n$ times with each
observation left out at time in fitting and used to evaluate the
predictive performance. This corresponds to using the already seen
observations as pseudo Monte Carlo samples from the future data
distribution, with the leave-trick used to avoid double use of data.
With the often used log-score we get
$$\mathrm{LOO} = \frac{1}{n} \sum_{i=1}^n \log {p(y_i|x_i,D_{-i},M_k)}.$$
Cross-validation is useful when we don't trust any model (the
models might include good enough models, but we just don't know if that is the case).
Next we demonstrate one of the weaknesses of cross-validation (same
holds for WAIC etc.).
## Analysis of the observed data
To use leave-one-out where "one" refers to an individual patient, we
need to change the model formulation a bit. In the above model
formulation, the individual observations have been aggregated to group
observations and running `loo(fit_bin2)` would try to leave one group
completely. In case of having more groups, this could be what we want,
but in case of just two groups it is unlikely. Thus, in the following we
switch a Bernoulli model with each individual as it's own observation.
```{r, warning=FALSE, error=FALSE}
d_bin2b <- data.frame(y = c(rep(1,39), rep(0,674-39), rep(1,22), rep(0,680-22)), grp2 = c(rep(0, 674), rep(1, 680)))
fit_bin2b <- stan_glm(y ~ grp2, family = binomial(), data = d_bin2b, seed=180202538, refresh=0)
```
We fit also a "null" model which doesn't use the group variable and thus has common parameter for both groups.'
```{r, warning=FALSE, error=FALSE}
fit_bin2bnull <- stan_glm(y ~ 1, family = binomial(), data = d_bin2b, seed=180202538, refresh=0)
```
We can then use cross-validation to compare whether adding the
treatment variable improves predictive performance. We use fast Pareto smoothed importance sampling leave-one-out cross-validation [PSIS-LOO; @Vehtari+etal:PSIS-LOO:2017].
```{r}
(loo_bin2 <- loo(fit_bin2b))
(loo_bin2null <- loo(fit_bin2bnull))
```
All Pareto $k<0.5$ and we can trust PSIS-LOO result [@Vehtari+etal:PSIS-LOO:2017; @Vehtari+etal:PSIS:2019].
Let's make pairwise comparison.
```{r}
loo_compare(loo_bin2null, loo_bin2)
```
elpd_diff is small compared to se, and thus cross-validation is uncertain whether estimating the treatment effect improves the predictive performance. To put this in perspective, we have $N_1=674$ and $N_2=680$, and 5.8% and 3.2% deaths, and this is now too weak information for cross-validation.
## Simulation experiment
Simulation experiment `binom_odds_comparison.R` runs 100 simulations with simulated data with varying oddsratio (0.1,...,1.0) and computes LOO comparison for each run.
Variation in LOO comparison when true oddsratio is varied.
```{r}
ggplot(looprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```
We see that using the posterior distribution from the model is more
efficient to detect the effect, but cross-validation will detect it
eventually too. The difference here comes that cross-validation
doesn't trust the model, compares the model predictions to the "future
data" using very weak assumption about the future. The weak assumption
about the future is also the cross-validation strength as we we'll see
in another notebook.
# Reference predictive approach
We can also do predictive performance estimates using stronger assumption about the future. A reference predictive estimate with log-score can be computed as
$$\mathrm{elpd}_{\mathrm{ref}} = \int p(\tilde{y}|D,M_*) \log p(\tilde{y}|D,M_k) d\tilde{y}, $$
where $M_*$ is a reference model we trust. Using a reference model to assess the other models corresponds to $M$-completed case [@Vehtari+Ojanen:2012], where the true model is replaced with a model we trust to be close enough to the true model.
## Simulation experiment
The next figure shows the results from the same simulation study using a reference predictive approach with the `fit_bin2` model used as the reference.
```{r}
ggplot(refprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```
We can see better accuracy than for cross-validation. The similar improvement in the model selection performance is observed in projection predictive variable selection [@Piironen+Vehtari:2017a; @Piironen+etal:projpred:2020] implemented in [`projpred` package](https://cran.r-project.org/package=projpred).
# Marginal likelihood
As comparison we include marginal likelihood based approach to compute the posterior probabilities for the null model (treatment effect is zero) and the model with unknown treatment effect. As the data and models are very simple, we may assume that the model is well specified. Marginal likelihoods and relative posterior probabilities can be sensitive to selected prior on the more complex model. Here we simply use the same prior as for the above examples. Marginal likelihoods are computed using the default bridge sampling approach implemented in `bridge_sampling` package.
## Analysis of the observed data
```{r}
# rerun models with diagnostic file required by bridge_sampler
fit_bin2 <- stan_glm(y/N ~ grp2, family = binomial(), data = d_bin2,
weights = N, refresh=0,
diagnostic_file = file.path(tempdir(), "df.csv"))
(ml_bin2 <- bridge_sampler(fit_bin2, silent=TRUE))
fit_bin2null <- stan_glm(y/N ~ 1, family = binomial(), data = d_bin2,
weights = N, refresh=0,
diagnostic_file = file.path(tempdir(), "df.csv"))
(ml_bin2null <- bridge_sampler(fit_bin2null, silent=TRUE))
print(post_prob(ml_bin2, ml_bin2null), digits=2)
```
Posterior probability computed from the marginal likelihoods is indecisive.
## Simulation experiment
We repeat the simulation with marginal likelihood approach.
```{r}
ggplot(bfprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```
We can see that marginal likelihood based approach favors more strongly null model for smaller treatment effects, requires a bigger effect than the other approaches to not favor the null model, but given big enough effect is more decisive on non-null model than cross-validation.
<br />
# References {.unnumbered}
<div id="refs"></div>
# Licenses {.unnumbered}
* Code © 2018-2019, Aki Vehtari, licensed under BSD-3.
* Text © 2018-2019, Aki Vehtari, licensed under CC-BY-NC 4.0.
* Part of the code copied from [rstanarm_demo.Rmd](https://github.com/avehtari/BDA_R_demos/blob/master/demos_rstan/rstanarm_demo.Rmd) written by Aki Vehtari and Markus Paasiniemi
# Original Computing Environment {.unnumbered}
```{r}
sessionInfo()
```
<br />