@@ -15,82 +15,110 @@ params:
15
15
16
16
``` {r, child="children/SETTINGS-knitr.txt"}
17
17
```
18
+ ``` {r, pkgs, include=FALSE}
19
+ library("ggplot2")
20
+ library("rstanarm")
21
+ ```
18
22
19
23
This vignette focuses on graphical posterior predictive checks (PPC). Plots of parameter estimates
20
24
from MCMC draws are covered in the separate vignette
21
25
[ Plotting MCMC draws using the bayesplot package] ( MCMC.html ) ,
22
26
and MCMC diagnostics are covered in
23
27
[ Visual MCMC diagnostics using the bayesplot package] ( MCMC-diagnostics.html ) .
24
28
29
+ In addition to ** bayesplot** we'll load the following packages:
30
+
31
+ * __ ggplot2__ for customizing the ggplot objects created by ** bayesplot**
32
+ * __ rstanarm__ for fitting the example models used throughout the vignette
33
+
34
+ ``` {r, eval=FALSE}
35
+ library("bayesplot")
36
+ library("ggplot2")
37
+ library("rstanarm")
38
+ ```
39
+
25
40
## Overview
26
41
27
- The __ bayesplot __ package provides various plotting functions for
42
+ The ** bayesplot ** package provides various plotting functions for
28
43
_ graphical posterior predictive checking_ , that is, creating graphical displays
29
44
comparing observed data to simulated data from the posterior predictive
30
45
distribution.
31
46
32
- The idea behind posterior predictive checking is simple: if a model is a
33
- good fit then we should be able to use it to generate data that looks a lot like
34
- the data we observed.
47
+ The idea behind posterior predictive checking is simple: if a model is a good
48
+ fit then we should be able to use it to generate data that looks a lot like the
49
+ data we observed.
35
50
36
51
#### Posterior predictive distribution
37
52
To generate the data used for posterior predictive checks (PPCs) we simulate
38
53
from the _ posterior predictive distribution_ The posterior predictive
39
54
distribution is the distribution of the outcome variable implied by a model
40
55
after using the observed data $y$ (a vector of $N$ outcome values) to update our
41
- beliefs about unknown model parameters $\theta$. The posterior predictive
56
+ beliefs about unknown model parameters $\theta$. The posterior predictive
42
57
distribution for observation $\widetilde{y}$ can be written as
43
58
$$ p(\widetilde{y} \, |\, y) = \int
44
59
p(\widetilde{y} \,|\, \theta) \, p(\theta \,|\, y) \, d\theta. $$
45
60
Typically we will also condition on $X$ (a matrix of predictor variables).
46
61
47
62
For each draw (simulation) $s = 1, \ldots, S$ of the parameters from the
48
63
posterior distribution, $\theta^{(s)} \sim p(\theta \, |\, y)$, we draw an entire
49
- vector of $N$ outcomes $\widetilde{y}^{(s)}$ from the posterior predictive distribution
50
- by simulating from the data model conditional on parameters $\theta^{(s)}$.
51
- The result is an $S \times N$ matrix of draws $\widetilde{y}$.
64
+ vector of $N$ outcomes $\widetilde{y}^{(s)}$ from the posterior predictive
65
+ distribution by simulating from the data model conditional on parameters
66
+ $\theta^{(s)}$. The result is an $S \times N$ matrix of draws $\widetilde{y}$.
52
67
53
68
When simulating from the posterior predictive distribution we can use either the
54
69
same values of the predictors $X$ that we used when fitting the model or new
55
70
observations of those predictors. When we use the same values of $X$ we denote
56
71
the resulting simulations by $y^{rep}$, as they can be thought of as
57
72
replications of the outcome $y$ rather than predictions for future observations
58
- ($\widetilde{y}$ using predictors $\widetilde{X}$). This corresponds to the notation
59
- from Gelman et. al. (2013) and is the notation used throughout the package
60
- documentation.
73
+ ($\widetilde{y}$ using predictors $\widetilde{X}$). This corresponds to the
74
+ notation from Gelman et. al. (2013) and is the notation used throughout the
75
+ package documentation.
61
76
62
77
63
78
## Graphical posterior predictive checks
64
79
65
80
Using the replicated datasets drawn from the posterior predictive
66
- distribution, the functions in the __ bayesplot __ package create various
81
+ distribution, the functions in the ** bayesplot ** package create various
67
82
graphical displays comparing the observed data $y$ to the replications.
68
- The names of the __ bayesplot __ plotting functions for posterior predictive
83
+ The names of the ** bayesplot ** plotting functions for posterior predictive
69
84
checking all have the prefix ` ppc_ ` .
70
85
71
- To demonstrate some of the various PPCs that can be created with the __ bayesplot__
72
- package we'll use an example of comparing Poisson and Negative binomial
73
- regression models from the
74
- [ ** rstanarm** ] ( https://CRAN.R-project.org/package=rstanarm ) package
75
- vignette [ _ stan_glm: GLMs for Count
76
- Data_ ] ( https://CRAN.R-project.org/package=rstanarm/vignettes/count.html ) (Gabry and Goodrich, 2017).
86
+ To demonstrate some of the various PPCs that can be created with the
87
+ ** bayesplot** package we'll use an example of comparing Poisson and Negative
88
+ binomial regression models from the
89
+ [ ** rstanarm** ] ( https://CRAN.R-project.org/package=rstanarm )
90
+ package vignette
91
+ [ _ stan_glm: GLMs for Count Data_ ] ( https://CRAN.R-project.org/package=rstanarm/vignettes/count.html )
92
+ (Gabry and Goodrich, 2017).
77
93
78
- > We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. [ ...]
79
- The regression predictors for the model are the pre-treatment number of roaches ` roach1 ` , the treatment indicator ` treatment ` , and a variable ` senior ` indicating whether the apartment is in a building restricted to elderly residents. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure [ ...] .
94
+ > We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. [ ...] The regression predictors for the model are the pre-treatment number of roaches ` roach1 ` , the treatment indicator ` treatment ` , and a variable ` senior ` indicating whether the apartment is in a building restricted to elderly residents. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure [ ...] .
80
95
81
96
First we fit a Poisson regression model with outcome variable ` y ` representing
82
97
the roach count in each apartment at the end of the experiment.
83
98
84
- ``` {r, roaches-model, results="hide", message=FALSE,warning=FALSE}
85
- library("rstanarm")
99
+ ``` {r, roaches-data}
86
100
head(roaches) # see help("rstanarm-datasets")
87
-
88
101
roaches$roach1 <- roaches$roach1 / 100 # pre-treatment number of roaches (in 100s)
89
- fit_poisson <- stan_glm(y ~ roach1 + treatment + senior,
90
- offset = log(exposure2),
91
- family = poisson(link = "log"),
92
- data = roaches,
93
- seed = 1111)
102
+ ```
103
+
104
+ ``` {r, eval=FALSE}
105
+ fit_poisson <- stan_glm(
106
+ y ~ roach1 + treatment + senior,
107
+ offset = log(exposure2),
108
+ family = poisson(link = "log"),
109
+ data = roaches,
110
+ seed = 1111
111
+ )
112
+ ```
113
+
114
+ ``` {r, roaches-model, include=FALSE}
115
+ fit_poisson <- stan_glm(
116
+ y ~ roach1 + treatment + senior,
117
+ offset = log(exposure2),
118
+ family = poisson(link = "log"),
119
+ data = roaches,
120
+ seed = 1111
121
+ )
94
122
```
95
123
96
124
``` {r, print}
@@ -99,15 +127,18 @@ print(fit_poisson)
99
127
100
128
We'll also fit the negative binomial model that we'll compare to the poisson:
101
129
102
- ``` {r, roaches-model-2, results="hide", message=FALSE,warning=FALSE}
130
+ ``` {r, eval=FALSE}
131
+ fit_nb <- update(fit_poisson, family = "neg_binomial_2")
132
+ ```
133
+ ``` {r, roaches-model-2, include=FALSE}
103
134
fit_nb <- update(fit_poisson, family = "neg_binomial_2")
104
135
```
105
136
106
137
``` {r, print-2}
107
138
print(fit_nb)
108
139
```
109
140
110
- In order to use the PPC functions from the __ bayesplot __ package we need
141
+ In order to use the PPC functions from the ** bayesplot ** package we need
111
142
a matrix of draws from the posterior predictive distribution. Since we fit
112
143
the models using __ rstanarm__ we can use its ` posterior_predict ` function:
113
144
@@ -128,9 +159,6 @@ The first PPC we'll look at is a comparison of the distribution of `y` and the
128
159
distributions of some of the simulated datasets (rows) in the ` yrep ` matrix.
129
160
130
161
``` {r ppc_dens_overlay}
131
- library("ggplot2")
132
- library("bayesplot")
133
-
134
162
color_scheme_set("brightblue") # see help("bayesplot-colors")
135
163
136
164
y <- roaches$y
@@ -245,38 +273,41 @@ available_ppc(pattern = "_grouped")
245
273
246
274
## Providing an interface to bayesplot PPCs from another package
247
275
248
- The __ bayesplot __ package provides the S3 generic function ` pp_check ` . Authors of
276
+ The ** bayesplot ** package provides the S3 generic function ` pp_check ` . Authors of
249
277
R packages for Bayesian inference are encouraged to define methods for the
250
278
fitted model objects created by their packages. This will hopefully be
251
279
convenient for both users and developers and contribute to the use of the same
252
280
naming conventions across many of the R packages for Bayesian data analysis.
253
281
254
- To provide an interface to __ bayesplot __ from your package, you can very
282
+ To provide an interface to ** bayesplot ** from your package, you can very
255
283
easily define a ` pp_check ` method (or multiple ` pp_check ` methods) for the
256
284
fitted model objects created by your package. All a ` pp_check ` method needs to
257
285
do is provide the ` y ` vector and ` yrep ` matrix arguments to the various plotting
258
- functions included in __ bayesplot __ .
286
+ functions included in ** bayesplot ** .
259
287
260
288
### Defining a ` pp_check ` method
261
289
262
290
Here is an example for how to define a simple ` pp_check ` method in a package
263
291
that creates fitted model objects of class ` "foo" ` . We will define a method
264
292
` pp_check.foo ` that extracts the data ` y ` and the draws from the posterior
265
293
predictive distribution ` yrep ` from an object of class ` "foo" ` and then calls
266
- one of the plotting functions from __ bayesplot __ .
294
+ one of the plotting functions from ** bayesplot ** .
267
295
268
296
Suppose that objects of class ` "foo" ` are lists with named components, two of
269
297
which are ` y ` and ` yrep ` . Here's a simple method ` pp_check.foo ` that offers the
270
298
user the option of two different plots:
271
299
272
300
``` {r, pp_check.foo}
273
- pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
301
+ # @param object An object of class "foo".
302
+ # @param type The type of plot.
303
+ # @param ... Optional arguments passed to the bayesplot plotting function.
304
+ pp_check.foo <- function(object, type = c("multiple", "overlaid"), ...) {
274
305
y <- object[["y"]]
275
306
yrep <- object[["yrep"]]
276
307
switch(
277
308
match.arg(type),
278
- multiple = ppc_hist(y, yrep[1:min(8 , nrow(yrep)),, drop = FALSE]),
279
- overlaid = ppc_dens_overlay(y, yrep)
309
+ multiple = ppc_hist(y, yrep[1:min(5 , nrow(yrep)),, drop = FALSE], ... ),
310
+ overlaid = ppc_dens_overlay(y, yrep, ... )
280
311
)
281
312
}
282
313
```
@@ -289,23 +320,25 @@ x <- list(y = rnorm(50), yrep = matrix(rnorm(5000), nrow = 100, ncol = 50))
289
320
class(x) <- "foo"
290
321
```
291
322
``` {r, pp_check-1, eval=FALSE}
292
- pp_check(x)
323
+ color_scheme_set("purple")
324
+ pp_check(x, type = "multiple", binwidth = 0.25)
293
325
```
294
326
``` {r, print-1, echo=FALSE}
295
- gg <- pp_check(x)
327
+ color_scheme_set("purple")
328
+ gg <- pp_check(x, type = "multiple", binwidth = 0.25)
296
329
suppressMessages(print(gg))
297
330
```
298
331
``` {r, pp_check-2}
332
+ color_scheme_set("darkgray")
299
333
pp_check(x, type = "overlaid")
300
334
```
301
335
302
336
### Examples of ` pp_check ` methods in other packages
303
337
304
- Several packages currently (or will soon) use this approach to provide an
305
- interface to ** bayesplot** 's graphical posterior predictive checks. See, for
306
- example, the ` pp_check ` methods in the
307
- [ ** rstanarm** ] ( https://github.com/stan-dev/rstanarm )
308
- and [ ** brms** ] ( https://github.com/paul-buerkner/brms ) packages.
338
+ Several packages currently use this approach to provide an interface to
339
+ ** bayesplot** 's graphical posterior predictive checks. See, for example, the
340
+ ` pp_check ` methods in the [ ** rstanarm** ] ( https://CRAN.R-project.org/package=rstanarm )
341
+ and [ ** brms** ] ( https://CRAN.R-project.org/package=brms ) packages.
309
342
310
343
## References
311
344
0 commit comments