Skip to content

Commit d1cd48c

Browse files
committed
Load packages the same way in each vignette
[ci skip]
1 parent 76cd7ba commit d1cd48c

File tree

4 files changed

+132
-93
lines changed

4 files changed

+132
-93
lines changed

R/mcmc-intervals.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ mcmc_areas <- function(x,
311311
yend = ~ maxy,
312312
color = if (!color_by_rhat) NULL else ~ rhat
313313
),
314-
size = 1.25
314+
size = 1
315315
)
316316
if (!color_by_rhat)
317317
segment_args$color <- get_color("m")

vignettes/graphical-ppcs.Rmd

Lines changed: 81 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -15,82 +15,110 @@ params:
1515

1616
```{r, child="children/SETTINGS-knitr.txt"}
1717
```
18+
```{r, pkgs, include=FALSE}
19+
library("ggplot2")
20+
library("rstanarm")
21+
```
1822

1923
This vignette focuses on graphical posterior predictive checks (PPC). Plots of parameter estimates
2024
from MCMC draws are covered in the separate vignette
2125
[Plotting MCMC draws using the bayesplot package](MCMC.html),
2226
and MCMC diagnostics are covered in
2327
[Visual MCMC diagnostics using the bayesplot package](MCMC-diagnostics.html).
2428

29+
In addition to **bayesplot** we'll load the following packages:
30+
31+
* __ggplot2__ for customizing the ggplot objects created by **bayesplot**
32+
* __rstanarm__ for fitting the example models used throughout the vignette
33+
34+
```{r, eval=FALSE}
35+
library("bayesplot")
36+
library("ggplot2")
37+
library("rstanarm")
38+
```
39+
2540
## Overview
2641

27-
The __bayesplot__ package provides various plotting functions for
42+
The **bayesplot** package provides various plotting functions for
2843
_graphical posterior predictive checking_, that is, creating graphical displays
2944
comparing observed data to simulated data from the posterior predictive
3045
distribution.
3146

32-
The idea behind posterior predictive checking is simple: if a model is a
33-
good fit then we should be able to use it to generate data that looks a lot like
34-
the data we observed.
47+
The idea behind posterior predictive checking is simple: if a model is a good
48+
fit then we should be able to use it to generate data that looks a lot like the
49+
data we observed.
3550

3651
#### Posterior predictive distribution
3752
To generate the data used for posterior predictive checks (PPCs) we simulate
3853
from the _posterior predictive distribution_ The posterior predictive
3954
distribution is the distribution of the outcome variable implied by a model
4055
after using the observed data $y$ (a vector of $N$ outcome values) to update our
41-
beliefs about unknown model parameters $\theta$. The posterior predictive
56+
beliefs about unknown model parameters $\theta$. The posterior predictive
4257
distribution for observation $\widetilde{y}$ can be written as
4358
$$p(\widetilde{y} \,|\, y) = \int
4459
p(\widetilde{y} \,|\, \theta) \, p(\theta \,|\, y) \, d\theta.$$
4560
Typically we will also condition on $X$ (a matrix of predictor variables).
4661

4762
For each draw (simulation) $s = 1, \ldots, S$ of the parameters from the
4863
posterior distribution, $\theta^{(s)} \sim p(\theta \,|\, y)$, we draw an entire
49-
vector of $N$ outcomes $\widetilde{y}^{(s)}$ from the posterior predictive distribution
50-
by simulating from the data model conditional on parameters $\theta^{(s)}$.
51-
The result is an $S \times N$ matrix of draws $\widetilde{y}$.
64+
vector of $N$ outcomes $\widetilde{y}^{(s)}$ from the posterior predictive
65+
distribution by simulating from the data model conditional on parameters
66+
$\theta^{(s)}$. The result is an $S \times N$ matrix of draws $\widetilde{y}$.
5267

5368
When simulating from the posterior predictive distribution we can use either the
5469
same values of the predictors $X$ that we used when fitting the model or new
5570
observations of those predictors. When we use the same values of $X$ we denote
5671
the resulting simulations by $y^{rep}$, as they can be thought of as
5772
replications of the outcome $y$ rather than predictions for future observations
58-
($\widetilde{y}$ using predictors $\widetilde{X}$). This corresponds to the notation
59-
from Gelman et. al. (2013) and is the notation used throughout the package
60-
documentation.
73+
($\widetilde{y}$ using predictors $\widetilde{X}$). This corresponds to the
74+
notation from Gelman et. al. (2013) and is the notation used throughout the
75+
package documentation.
6176

6277

6378
## Graphical posterior predictive checks
6479

6580
Using the replicated datasets drawn from the posterior predictive
66-
distribution, the functions in the __bayesplot__ package create various
81+
distribution, the functions in the **bayesplot** package create various
6782
graphical displays comparing the observed data $y$ to the replications.
68-
The names of the __bayesplot__ plotting functions for posterior predictive
83+
The names of the **bayesplot** plotting functions for posterior predictive
6984
checking all have the prefix `ppc_`.
7085

71-
To demonstrate some of the various PPCs that can be created with the __bayesplot__
72-
package we'll use an example of comparing Poisson and Negative binomial
73-
regression models from the
74-
[**rstanarm**](https://CRAN.R-project.org/package=rstanarm) package
75-
vignette [_stan_glm: GLMs for Count
76-
Data_](https://CRAN.R-project.org/package=rstanarm/vignettes/count.html) (Gabry and Goodrich, 2017).
86+
To demonstrate some of the various PPCs that can be created with the
87+
**bayesplot** package we'll use an example of comparing Poisson and Negative
88+
binomial regression models from the
89+
[**rstanarm**](https://CRAN.R-project.org/package=rstanarm)
90+
package vignette
91+
[_stan_glm: GLMs for Count Data_](https://CRAN.R-project.org/package=rstanarm/vignettes/count.html)
92+
(Gabry and Goodrich, 2017).
7793

78-
> We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. [...]
79-
The regression predictors for the model are the pre-treatment number of roaches `roach1`, the treatment indicator `treatment`, and a variable `senior` indicating whether the apartment is in a building restricted to elderly residents. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure [...].
94+
> We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. [...] The regression predictors for the model are the pre-treatment number of roaches `roach1`, the treatment indicator `treatment`, and a variable `senior` indicating whether the apartment is in a building restricted to elderly residents. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an exposure [...].
8095
8196
First we fit a Poisson regression model with outcome variable `y` representing
8297
the roach count in each apartment at the end of the experiment.
8398

84-
```{r, roaches-model, results="hide", message=FALSE,warning=FALSE}
85-
library("rstanarm")
99+
```{r, roaches-data}
86100
head(roaches) # see help("rstanarm-datasets")
87-
88101
roaches$roach1 <- roaches$roach1 / 100 # pre-treatment number of roaches (in 100s)
89-
fit_poisson <- stan_glm(y ~ roach1 + treatment + senior,
90-
offset = log(exposure2),
91-
family = poisson(link = "log"),
92-
data = roaches,
93-
seed = 1111)
102+
```
103+
104+
```{r, eval=FALSE}
105+
fit_poisson <- stan_glm(
106+
y ~ roach1 + treatment + senior,
107+
offset = log(exposure2),
108+
family = poisson(link = "log"),
109+
data = roaches,
110+
seed = 1111
111+
)
112+
```
113+
114+
```{r, roaches-model, include=FALSE}
115+
fit_poisson <- stan_glm(
116+
y ~ roach1 + treatment + senior,
117+
offset = log(exposure2),
118+
family = poisson(link = "log"),
119+
data = roaches,
120+
seed = 1111
121+
)
94122
```
95123

96124
```{r, print}
@@ -99,15 +127,18 @@ print(fit_poisson)
99127

100128
We'll also fit the negative binomial model that we'll compare to the poisson:
101129

102-
```{r, roaches-model-2, results="hide", message=FALSE,warning=FALSE}
130+
```{r, eval=FALSE}
131+
fit_nb <- update(fit_poisson, family = "neg_binomial_2")
132+
```
133+
```{r, roaches-model-2, include=FALSE}
103134
fit_nb <- update(fit_poisson, family = "neg_binomial_2")
104135
```
105136

106137
```{r, print-2}
107138
print(fit_nb)
108139
```
109140

110-
In order to use the PPC functions from the __bayesplot__ package we need
141+
In order to use the PPC functions from the **bayesplot** package we need
111142
a matrix of draws from the posterior predictive distribution. Since we fit
112143
the models using __rstanarm__ we can use its `posterior_predict` function:
113144

@@ -128,9 +159,6 @@ The first PPC we'll look at is a comparison of the distribution of `y` and the
128159
distributions of some of the simulated datasets (rows) in the `yrep` matrix.
129160

130161
```{r ppc_dens_overlay}
131-
library("ggplot2")
132-
library("bayesplot")
133-
134162
color_scheme_set("brightblue") # see help("bayesplot-colors")
135163
136164
y <- roaches$y
@@ -245,38 +273,41 @@ available_ppc(pattern = "_grouped")
245273

246274
## Providing an interface to bayesplot PPCs from another package
247275

248-
The __bayesplot__ package provides the S3 generic function `pp_check`. Authors of
276+
The **bayesplot** package provides the S3 generic function `pp_check`. Authors of
249277
R packages for Bayesian inference are encouraged to define methods for the
250278
fitted model objects created by their packages. This will hopefully be
251279
convenient for both users and developers and contribute to the use of the same
252280
naming conventions across many of the R packages for Bayesian data analysis.
253281

254-
To provide an interface to __bayesplot__ from your package, you can very
282+
To provide an interface to **bayesplot** from your package, you can very
255283
easily define a `pp_check` method (or multiple `pp_check` methods) for the
256284
fitted model objects created by your package. All a `pp_check` method needs to
257285
do is provide the `y` vector and `yrep` matrix arguments to the various plotting
258-
functions included in __bayesplot__.
286+
functions included in **bayesplot**.
259287

260288
### Defining a `pp_check` method
261289

262290
Here is an example for how to define a simple `pp_check` method in a package
263291
that creates fitted model objects of class `"foo"`. We will define a method
264292
`pp_check.foo` that extracts the data `y` and the draws from the posterior
265293
predictive distribution `yrep` from an object of class `"foo"` and then calls
266-
one of the plotting functions from __bayesplot__.
294+
one of the plotting functions from **bayesplot**.
267295

268296
Suppose that objects of class `"foo"` are lists with named components, two of
269297
which are `y` and `yrep`. Here's a simple method `pp_check.foo` that offers the
270298
user the option of two different plots:
271299

272300
```{r, pp_check.foo}
273-
pp_check.foo <- function(object, ..., type = c("multiple", "overlaid")) {
301+
# @param object An object of class "foo".
302+
# @param type The type of plot.
303+
# @param ... Optional arguments passed to the bayesplot plotting function.
304+
pp_check.foo <- function(object, type = c("multiple", "overlaid"), ...) {
274305
y <- object[["y"]]
275306
yrep <- object[["yrep"]]
276307
switch(
277308
match.arg(type),
278-
multiple = ppc_hist(y, yrep[1:min(8, nrow(yrep)),, drop = FALSE]),
279-
overlaid = ppc_dens_overlay(y, yrep)
309+
multiple = ppc_hist(y, yrep[1:min(5, nrow(yrep)),, drop = FALSE], ...),
310+
overlaid = ppc_dens_overlay(y, yrep, ...)
280311
)
281312
}
282313
```
@@ -289,23 +320,25 @@ x <- list(y = rnorm(50), yrep = matrix(rnorm(5000), nrow = 100, ncol = 50))
289320
class(x) <- "foo"
290321
```
291322
```{r, pp_check-1, eval=FALSE}
292-
pp_check(x)
323+
color_scheme_set("purple")
324+
pp_check(x, type = "multiple", binwidth = 0.25)
293325
```
294326
```{r, print-1, echo=FALSE}
295-
gg <- pp_check(x)
327+
color_scheme_set("purple")
328+
gg <- pp_check(x, type = "multiple", binwidth = 0.25)
296329
suppressMessages(print(gg))
297330
```
298331
```{r, pp_check-2}
332+
color_scheme_set("darkgray")
299333
pp_check(x, type = "overlaid")
300334
```
301335

302336
### Examples of `pp_check` methods in other packages
303337

304-
Several packages currently (or will soon) use this approach to provide an
305-
interface to **bayesplot**'s graphical posterior predictive checks. See, for
306-
example, the `pp_check` methods in the
307-
[**rstanarm**](https://github.com/stan-dev/rstanarm)
308-
and [**brms**](https://github.com/paul-buerkner/brms) packages.
338+
Several packages currently use this approach to provide an interface to
339+
**bayesplot**'s graphical posterior predictive checks. See, for example, the
340+
`pp_check` methods in the [**rstanarm**](https://CRAN.R-project.org/package=rstanarm)
341+
and [**brms**](https://CRAN.R-project.org/package=brms) packages.
309342

310343
## References
311344

vignettes/plotting-mcmc-draws.Rmd

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,28 @@ params:
1515

1616
```{r, child="children/SETTINGS-knitr.txt"}
1717
```
18+
```{r, pkgs, include=FALSE}
19+
library("ggplot2")
20+
library("rstanarm")
21+
```
1822

1923
This vignette focuses on plotting parameter estimates from MCMC draws. MCMC
2024
diagnostic plots are covered in the separate vignette
2125
[Visual MCMC diagnostics using the bayesplot package](MCMC-diagnostics.html),
2226
and graphical posterior predictive checks are covered in
2327
[Graphical posterior predictive checks using the bayesplot package](PPC.html).
2428

29+
In addition to __bayesplot__ we'll load the following packages:
30+
31+
* __ggplot2__ for customizing the ggplot objects created by __bayesplot__
32+
* __rstanarm__ for fitting the example models used throughout the vignette
33+
34+
```{r, eval=FALSE}
35+
library("bayesplot")
36+
library("ggplot2")
37+
library("rstanarm")
38+
```
39+
2540
## Plots for MCMC draws
2641

2742
The **bayesplot** package provides various plotting functions for visualizing
@@ -31,30 +46,22 @@ parameters of a Bayesian model.
3146
In this vignette we'll use draws obtained using the `stan_glm` function in the
3247
**rstanarm** package (Gabry and Goodrich, 2017), but MCMC draws from using
3348
any package can be used with the functions in the **bayesplot** package. See,
34-
for example, **brms** (which, like **rstanarm**, calls the **rstan** package
35-
internally to use [Stan](http://mc-stan.org/)'s MCMC sampler).
49+
for example, **brms**, which, like **rstanarm**, calls the **rstan** package
50+
internally to use [Stan](http://mc-stan.org/)'s MCMC sampler.
3651

37-
```{r, eval=FALSE, results='hide'}
38-
library("rstanarm")
39-
fit <- stan_glm(
40-
mpg ~ ., # ~ . includes all other variables in dataset
41-
data = mtcars,
42-
chains = 4,
43-
iter = 2000,
44-
seed = 1111
45-
)
52+
```{r, mtcars}
53+
head(mtcars) # see help("mtcars")
54+
```
55+
56+
```{r, eval=FALSE}
57+
fit <- stan_glm(mpg ~ ., # '.' means includes all variables
58+
data = mtcars,
59+
seed = 1111)
4660
print(fit)
4761
```
4862

49-
```{r stan_glm, echo=FALSE, results='hide'}
50-
suppressPackageStartupMessages(library("rstanarm"))
51-
fit <- stan_glm(
52-
mpg ~ .,
53-
data = mtcars,
54-
chains = 4,
55-
iter = 2000,
56-
seed = 1111
57-
)
63+
```{r stan_glm, include=FALSE}
64+
fit <- stan_glm(mpg ~ ., data = mtcars, seed = 1111)
5865
```
5966

6067
```{r, print-fit, echo=FALSE}
@@ -76,7 +83,6 @@ Posterior intervals for the parameters can be plotted using the `mcmc_intervals`
7683
function.
7784

7885
```{r, mcmc_intervals}
79-
library("bayesplot")
8086
color_scheme_set("red")
8187
mcmc_intervals(posterior, pars = c("cyl", "drat", "am", "sigma"))
8288
```
@@ -113,23 +119,23 @@ The `mcmc_hist` and `mcmc_dens` functions plot posterior distributions (combinin
113119

114120
```{r, mcmc_hist, message=FALSE}
115121
color_scheme_set("green")
116-
mcmc_hist(posterior, pars = c("wt", "am"))
117-
mcmc_dens(posterior, pars = c("wt", "am"))
122+
mcmc_hist(posterior, pars = c("wt", "sigma"))
123+
mcmc_dens(posterior, pars = c("wt", "sigma"))
118124
```
119125

120-
To view the four Markov chain separately we can use `mcmc_hist_by_chain`, `mcmc_dens_overlay`:
126+
To view the four Markov chain separately we can use `mcmc_hist_by_chain`, `mcmc_dens_overlay`, and `mcmc_violin`:
121127

122128
```{r, mcmc_hist_by_chain, message=FALSE}
123129
color_scheme_set("brightblue")
124-
mcmc_hist_by_chain(posterior, pars = c("wt", "am"))
125-
mcmc_dens_overlay(posterior, pars = c("wt", "am"))
130+
mcmc_hist_by_chain(posterior, pars = c("wt", "sigma"))
131+
mcmc_dens_overlay(posterior, pars = c("wt", "sigma"))
126132
```
127133

128-
The `mcmc_violin` function also plots the density estimates of
129-
each chain as violins with horizontal lines at user-specified quantiles:
134+
The `mcmc_violin` function plots the density estimates of each chain as violins
135+
with horizontal lines at user-specified quantiles:
130136

131137
```{r, mcmc_violin}
132-
mcmc_violin(posterior, pars = c("wt", "am"), probs = c(0.1, 0.5, 0.9))
138+
mcmc_violin(posterior, pars = c("wt", "sigma"), probs = c(0.1, 0.5, 0.9))
133139
```
134140

135141
### Scatterplots
@@ -138,7 +144,8 @@ The `mcmc_scatter` function creates a scatterplot with two parameters:
138144

139145
```{r, mcmc_scatter}
140146
color_scheme_set("gray")
141-
mcmc_scatter(posterior, pars = c("(Intercept)", "wt"), size = 1.5, alpha = 0.5)
147+
mcmc_scatter(posterior, pars = c("(Intercept)", "wt"),
148+
size = 1.5, alpha = 0.5)
142149
```
143150

144151
The `mcmc_hex` function creates a similar plot but using hexagonal binning, which can be useful to avoid overplotting:

0 commit comments

Comments
 (0)