09.1a_A_single_coin_from_a_single_mint.Rmd

---
title: "9.1(a): A single coin from a single mint"
output: html_notebook
---


## Introduction

This is the first of two models in Section 9.1 of Kruschke. It has a lot of uncertainly about $\omega$, but a high degree of dependence of $\theta$ on $\omega$.


## Preliminaries

Load necessary packages:

```{r}
library(tidyverse)
library(rstan)
library(tidybayes)
library(bayesplot)
```

Set Stan to save compiled code.

```{r}
rstan_options(auto_write = TRUE)
```

Set Stan to use parallel processing where possible.

```{r}
options(mc.cores = parallel::detectCores())
```


## Data

```{r}
N <- 12
y <- c(rep(1, 9), rep(0, 3)) # 9 heads, 3 tails
y
```

```{r}
stan_data <- list(N = N, y = y)
```


## Prior

### Simulation code

We use the `transformed data` block to define constants in our model: $A_{\omega}$ and $B_{\omega}$, the parameters for our beta hyperprior, and $K$, the *concentration* that reflects how tightly $\theta$ should stay to the mode $\omega$. We also define the parameters $a$ and $b$ of the beta prior for $\theta$. This is not strictly necessary, but it makes the code much easier to read.

The `for` loop at the end uses a function that generates a Bernoulli distribution (either 1 or 0 for success or failure) given a probability of success $\theta$. In Stan, this block is run once every time the MCMC process accepts a target; for the value of $\theta$ that happens to be sampled in that round, the `for` loop generates $N$ "coin flips" given that value of $\theta$. The idea is that this becomes "fake data" generated from the prior distribution.

```{stan, output.var = "scsma_prior", cache = TRUE}
data {
    int<lower = 0> N;
    array[N] int<lower = 0, upper = 1> y;
}
transformed data {
    real<lower = 0> A_omega;
    real<lower = 0> B_omega;
    real<lower = 0> K;

    A_omega = 2;   // hyperprior parameters
    B_omega = 2;
    K = 100;       // concentration
}
generated quantities {
    real<lower = 0, upper = 1> omega;
    real<lower = 0, upper = 1> theta;
    real<lower = 0> a;
    real<lower = 0> b;
    array[N] int<lower = 0, upper = 1> y_sim;
    
    omega = beta_rng(A_omega, B_omega); // mode
    a = omega * (K - 2) + 1;
    b = (1 - omega) * (K - 2) + 1;
    theta = beta_rng(a, b);  // probability of success
    
    for (n in 1:N) {
        y_sim[n] = bernoulli_rng(theta);
    }
}
```

```{r, cache = TRUE}
fit_scsma_prior <- sampling(scsma_prior,
                            data = stan_data,
                            chains = 1,
                            algorithm = "Fixed_param",
                            seed = 11111,
                            refresh = 0)
```

```{r}
samples_scsma_prior <- tidy_draws(fit_scsma_prior)
samples_scsma_prior
```

### Examine prior

```{r}
mcmc_hist(samples_scsma_prior,
          pars = c("omega", "theta"))
```

No surprise that `omega` and `theta` are highly correlated:

```{r}
mcmc_pairs(fit_scsma_prior,
           pars = c("omega", "theta"))
```

### Prior predictive distribution

```{r}
y_sim_scsma_prior <- samples_scsma_prior %>%
    dplyr::select(starts_with("y_sim")) %>%
    as.matrix()
```

```{r}
ppd_hist(ypred = y_sim_scsma_prior[1:20,])
```

There is a wide variety of outcomes, as expected from a relatively weak prior.


## Model

In the model code below, we list `omega` and `theta` as the parameters of interest. The `a` and `b` parameters are defined in the `transformed parameters` block. They make the model easier to read, but they are not the main objects of our inference.

```{stan, output.var = "scsma", cache = TRUE}
data {
    int<lower = 0> N;
    array[N] int<lower = 0, upper = 1> y;
}
transformed data {
    real<lower = 0> A_omega;  
    real<lower = 0> B_omega;
    real<lower = 0> K;        

    A_omega = 2;   // hyperprior parameters
    B_omega = 2;
    K = 100;       // concentration
}
parameters {
    real<lower = 0, upper = 1> omega;
    real<lower = 0, upper = 1> theta;
}
transformed parameters {
    real<lower = 0> a;
    real<lower = 0> b;
    
    a = omega * (K - 2) + 1;
    b = (1 - omega) * (K - 2) + 1;
}
model {
    omega ~ beta(A_omega, B_omega);  // hyperprior for mode
    theta ~ beta(a, b);              // prior prob of success
    y ~ bernoulli(theta);            // likelihood
}
generated quantities {
    array[N] int<lower = 0, upper = 1> y_sim;
    
    for (n in 1:N) {
        y_sim[n] = bernoulli_rng(theta);
    }
}
```

```{r, cache = TRUE}
fit_scsma <- sampling(scsma,
                      data = stan_data,
                      seed = 11111,
                      refresh = 0)
```

```{r}
samples_scsma <- tidy_draws(fit_scsma)
samples_scsma
```

## Model diagnostics

```{r}
stan_trace(fit_scsma,
           pars = c("omega", "theta"))
```

```{r}
mcmc_acf(fit_scsma,
         pars = c("omega", "theta"))
```

```{r}
mcmc_rhat(rhat(fit_scsma))
```

```{r}
mcmc_neff(neff_ratio(fit_scsma))
```


## Model summary

```{r}
print(fit_scsma,
      pars = c("omega", "theta"))
```


## Model visualization

```{r}
mcmc_areas(fit_scsma, pars = c("omega", "theta"))
```

```{r}
pairs(fit_scsma,
      pars = c("omega", "theta"))
```


## Posterior predictive check

```{r}
y_sim_scsma <- samples_scsma %>%
    dplyr::select(starts_with("y_sim")) %>%
    as.matrix()
```

```{r}
ppc_hist(y = y,
         yrep = y_sim_scsma[1:19, ])
```

```{r}
ppc_bars(y = y,
         yrep = y_sim_scsma)
```