day_1.Rmd

---
title: |
 | Introduction: Random allocation and Fisher's exact test
bibliography:
 - 'BIB/MasterBibliography.bib'
---

# Randomized experiments

### Randomized experiments

Randomized experiments and causal inference

-   Experiments are conceptually and practically central to causal
    inference

    -   Specifically experiments featuring control groups and random
        assignment

-   In many applications, these experiments are a "methodological ideal"

-   Stats 101 principles: Bias, variance, $p$-values, confidence levels,
    etc.

    -   But without ideas that data sampled from superpopulation or
        probabilistic outcome generating mechanism

    -   At least, those ideas are optional

# Random Assignment

### Example: Fisher's "Lady Tasting Tea"

@fisher1935a [p. 11] describes
"lady tasting tea" experiment as follows:

![image](images/fisher_lady_tasting_tea_text.png){width="50%"}

### Random assignment
\fontsize{11pt}{11pt}\selectfont

-   \mh{Treatment Assignment}

-   Denote whether $i$th unit (cup) is assigned to treatment
    (milk-first) or control (tea-first) by $z_i = 1$ or $z_i = 0$

-   Denote collection of all $N$ treatment indicator variables by\
    $\bm{z} = \begin{bmatrix} z_1 & z_2 & \ldots & z_N \end{bmatrix}^{\top}$

-   There are $2^N$ ways one could assign $N$ units to treatment or
    control

\begin{equation*}
\left\{0, 1\right\}^N = \left\{
\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix},
\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix},
\begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix},
\cdots ,
\begin{bmatrix} 1 \\ 0 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix},
\begin{bmatrix} 0 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix},
\begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}
\right\}
\end{equation*}

### Random assignment

-   For our purposes, a randomized design is a procedure for selecting
    any assignment from $\left\{0, 1\right\}^N$ with probability
    $p(\bm{z})$

-   Therefore, $\bm{Z}$ is a random vector with support
    $\Omega \coloneqq \left\{\bm{z}: p(\bm{z}) > 0\right\}$ and
    $\Pr\left(\bm{Z} = \bm{z}\right) = p(\bm{z})$

-   \mh{Bernoulli (simple random) assignment}

    -   $N$ independent flips of (usually fair) coin

-   \mh{Complete random assignment}

    -   $N$ draws from an urn in which some proportion are red (treated)
        balls and remaining proportion are blue (control) balls

### Random assignment

-   @fisher1935a [p. 11]:

    > "Our experiment consists in mixing eight cups of tea, four in one
    > way and four in the other, and presenting them to the subject for
    > judgment in a random order."

    -   Support of $\bm{Z}$ is
        $$\left\{0, 1\right\}^N \supset \Omega = \left\{
        \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix},
        \begin{bmatrix} 1 \\ 1 \\ 1 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{bmatrix},
        \begin{bmatrix} 1 \\ 1 \\ 1 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{bmatrix},
        \cdots ,
        \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \\ 1 \\ 1 \end{bmatrix},
        \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 1 \\ 1 \\ 1 \end{bmatrix},
        \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}
        \right\}$$

    -   Probability of each assignment in $\Omega$ is
        $\Pr\left(\mathbf{Z} = \mathbf{z} \right) = \left\lvert \Omega \right\rvert^{-1}$
        for all $\bm{z} \in \Omega$

    -   ($\left\lvert \Omega \right\rvert$ is the cardinality of, i.e.,
        number of elements in, the set $\Omega$)

### Randomization-based distributions
\fontsize{10pt}{10pt}\selectfont

-   Suppose first of $70$ assignments happened to be randomly selected

-   The "lady" correctly identifies all $4$ milk-first and all $4$
    tea-first cups

\vspace{1em}
\begin{table}[H]
\centering
\begin{tabular}{l|cc}
\toprule
Unit & $\bm{z}$ & $\bm{y}$ \\ 
  \midrule
$1$ & $1$ & $1$  \\ 
$2$ & $1$ & $1$  \\ 
$3$ & $1$ & $1$  \\ 
$4$ & $1$ & $1$  \\ 
$5$ & $0$ & $0$  \\ 
$6$ & $0$ & $0$  \\ 
$7$ & $0$ & $0$ \\ 
$8$ & $0$ & $0$ 
\end{tabular}
\caption{Results of R. A. Fisher's ``Lady Tasting Tea'' experiment}
\label{tab: lady tasting tea obs data}
\end{table}


-   We summarize data by number of focal-group (milk-first) cups
    correctly identified, $\mathbf{z}^{\top}\mathbf{y}$, which in this
    case is $\mathbf{z}^{\top}\mathbf{y} = 4$

### Randomization-based distributions

-   @fisher1935a entertained \mh{counter-to-fact assignments of
    treatment}, holding responses fixed at their
    observed values

-   Responses fixed at observed values corresponds to scenario in which
    "lady" cannot discriminate between milk-first and tea-first cups

\vfill
\begin{table}[H]
\scriptsize
    \begin{tabular}{l|l}
    \toprule
    $\mathbf{z}_1$ & $\mathbf{y}$ \\ \midrule
    1 & 1  \\
    1 & 1   \\
    1 & 1   \\
    1 & 1  \\
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    0 & 0  
    \end{tabular}
    \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_2$ & $\mathbf{y}$ \\ \midrule
    1 &  1  \\
    1 &  1  \\
    1 &  1  \\
    0 &  1   \\
    1 &  0  \\
    0 &  0  \\
    0 &  0  \\
    0 &  0  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_3$ & $\mathbf{y}$ \\ \midrule
    1 & 1  \\
    1 & 1  \\
    1 & 1  \\
    0 & 1   \\
    0 & 0  \\
    1 & 0  \\
    0 & 0  \\
    0 & 0  
    \end{tabular}
     \hfill
     $\cdots $
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{68}$ & $\mathbf{y}$ \\ \midrule
    0 & 1  \\
    0 & 1  \\
    0 & 1  \\
    1 & 1   \\
    1 & 0  \\
    0 & 0  \\
    1 & 0  \\
    1 & 0  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{69}$ & $\mathbf{y}$ \\ \midrule
    0 & 1  \\
    0 & 1  \\
    0 & 1  \\
    1 & 1  \\
    0 & 0 \\
    1 & 0  \\
    1 & 0  \\
    1 & 0  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{70}$ & $\mathbf{y}$ \\ \midrule
    0 & 1  \\
    0 & 1  \\
    0 & 1  \\
    0 & 1  \\
    1 & 0   \\
    1 & 0  \\
    1 & 0  \\
    1 & 0  
    \end{tabular}
\label{tab: fisher's null pot outs schedule}
\end{table}


### Randomization-based distributions

-   @fisher1935a: distribution of summary measure,
    $\bm{Z}^{\top} \bm{y}$, if "lady" could not discriminate between
    milk-first and tea-first cups

- ![distribution of summary measure if no discrimination](images/null_dist_no_discrim_plot.pdf){width="90%"}


## Potential Outcomes

### Potential outcomes

-   Thus far, we have entertained counter-to-fact assignments of
    treatment

    -   Responses have been fixed at their observed values

-   @neyman1923 and @rubin1974 also posited counter-to-fact outcomes

    -   I.e., \mh{potential outcomes}. More next time. 

-   E.g., perfect discrimination in "Lady Tasting Tea" example

\vfill
\begin{table}[H]
\scriptsize
    \begin{tabular}{l|l}
    \toprule
    $\mathbf{z}_1$ & $\mathbf{y}$ \\ \midrule
    1 & 1  \\
    1 & 1   \\
    1 & 1   \\
    1 & 1  \\
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    0 & 0  
    \end{tabular}
    \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_2$ & $\mathbf{y}$ \\ \midrule
    1 &  1  \\
    1 &  1  \\
    1 &  1  \\
    0 &  0   \\
    1 &  1  \\
    0 &  0  \\
    0 &  0  \\
    0 &  0  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_3$ & $\mathbf{y}$ \\ \midrule
    1 & 1  \\
    1 & 1  \\
    1 & 1  \\
    0 & 0   \\
    0 & 0  \\
    1 & 1  \\
    0 & 0  \\
    0 & 0  
    \end{tabular}
     \hfill
     $\cdots $
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{68}$ & $\mathbf{y}$ \\ \midrule
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    1 & 1   \\
    1 & 1  \\
    0 & 0  \\
    1 & 1  \\
    1 & 1  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{69}$ & $\mathbf{y}$ \\ \midrule
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    1 & 1  \\
    0 & 0 \\
    1 & 1  \\
    1 & 1  \\
    1 & 1  
    \end{tabular}
     \hfill
      \begin{tabular}{l|l}
      \toprule
    $\mathbf{z}_{70}$ & $\mathbf{y}$ \\ \midrule
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    0 & 0  \\
    1 & 1   \\
    1 & 1  \\
    1 & 1  \\
    1 & 1  
    \end{tabular}
\end{table} \vfill

### Bringing small, simple data into R

To enter the coffee data directly into R, do:

```{r eval=FALSE}
z = scan(nlines=1)
1:  1 1 1 1 0 0 0 0
```
(I.e., enter "`z = scan(nlines=1)`" at the "`>`"; at the "`1:`" that comes up, enter `1 1 1 1 0 0 0 0`.)  Use the same technique to assign `y`, and bind them together as follows:
```{r eval=FALSE}
coffee = data.frame(z, y) ; rm(z,y)
```

```{r echo=FALSE}
coffee = data.frame(z=rep(1:0, each=4), y=rep(1:0, each=4)) 
```

You can now access your two variables as `coffee$z` and `coffee$y`, or using "`with()`" as shown below.
```{r eval=FALSE}
with(coffee, sum(z*y)) #figures t(z,y) = z^Ty
with(coffee, fisher.test(z, y, alternative="g"))
```
We'll return to our explanation of `fisher.test()` after starting some R exercises.

\note{Break to begin exercise sheet}

# Hypothesis testing

## Statistical hypothesis testing in general

### Hypothesis testing

-   Use known assignment mechanism to \mh{reliably}
    gather evidence

    -   against null hypothesis about one potential outcome schedule\
        (e.g., no discrimination)

    -   and in favor of alternative hypothesis about another\
        (e.g., perfect discrimination)


### Hypothesis testing: Introduction

-   Steps of hypothesis testing:

    1.  From randomized experiment, we observe data, $(\bm{z}, \bm{y})$,
        and summarize data by test-stat, $t(\bm{z}, \bm{y})$

    2.  For purposes of argumentation, we postulate a \mh{sharp null hypothesis}

    3.  A sharp null hypothesis implies complete specification of
        unit-level responses to experiment under every possible
        assignment

    4.  Under sharp null hypothesis, calculate test-stat over all
        assignments

    5.  Compare observed test-stat in (1) to distribution of test-stats
        under null in (3)

    6.  If observed test-stat inconsistent with distribution of
        test-stat implied by sharp null, then reject sharp null;
        otherwise, don't


### Error and goals of hypothesis tests

-   \mh{Type I Error}: Rejecting null when null is true

-   \mh{Type II Error}: Failing to reject null when null is false

-   Hence, two goals of our tests:

    1.  Control the Type I Error:
        $\Pr\left(\text{Type I Error}\right) \leq \alpha$, where
        $\alpha$ is size of test

    2.  Control Type II Error: Make as large as possible, where power is
        $1 - \Pr\left(\text{Type II Error}\right)$

-   \mh{Definitions}:

-   Unbiased test:
    $\text{Power} \geq \Pr\left(\text{Type I Error}\right)$

-   Consistent test: $\text{Power} \to 1$ as size of experiment
    $\to \infty$


## Fisherian tests of the hypothesis of no effect

### "Lady Tasting Tea" Example

-   ![Distribution of test-stat over assignments under sharp null of no
    effects](images/null_dist_no_discrim_plot.pdf){width="90%"}

-   Upper $p$-value is $(1/70) \approx 0.0143$


### "Lady Tasting Tea" Example
\fontsize{11pt}{11pt}\selectfont


-   Upper, one-sided $p$-value $$\begin{aligned}
    \Pr\left(t(\bm{Z}, \bm{y}) \geq t^{\text{obs}}\right) & = \sum \limits_{\bm{z} \in \Omega} \mathbbm{1}\left\{t(\bm{z}, \bm{y}) \geq t^{\text{obs}}\right\} \Pr\left(\bm{Z} = \bm{z}\right)
    \end{aligned}$$

\pause
   $$\begin{aligned}
    \Pr\left(t(\bm{Z}, \bm{y}) \geq t^{\text{obs}}\right) & = \overbrace{\sum \limits_{\bm{z} \in \Omega}}^{\substack{\textcolor{magenta}{\text{Sum over all}} \\ \textcolor{magenta}{\text{assignments}}}} \underbrace{\mathbbm{1}\left\{t(\bm{z}, \bm{y}) \geq t^{\text{obs}}\right\}}_{\substack{\textcolor{magenta}{\text{Indicator of whether null test-stat}} \\ \textcolor{magenta}{\text{under assmt } \bm{z} \text{ is } \geq \text{ obs test-stat}}}} \underbrace{\Pr\left(\bm{Z} = \bm{z}\right)}_{\textcolor{magenta}{\text{Prob of assmt } \bm{z}}}
    \end{aligned}$$


### "Lady Tasting Tea" Example

-   ![Distribution of test-stat over assignments under two sharp causal
    effects](images/perfect_discrim_test_stat_plot.pdf){width="90%"}

-   Imagine that we test sharp null of no effects when (unbeknownst to
    us) true effect is that of perfect discrimination. What is power of
    test?

## Approximate p-values via Normal theory

### A slightly larger experiment

@antonioli2005rct randomized 15 particpants to dolphin therapy, 15 to tropical vacation without dolphins. After, depression was improved ($y=1$) for 13/30 particpants, not improved ($y=0$ for $17/30$. Again, we can consider the distribution of $\bm{Z}^{\top}\bm{y}$ under the sharp null of no effect.

\note{Show https://giphy.com/explore/dolphin}

### Tea and dolphins
\fontsize{11pt}{11pt}\selectfont

```{r fig.height=5, echo=FALSE}
allones <- c(teatasting=4, dolphins=13)
allzeroes <- c(teatasting=4, dolphins=17)
n_t <- c(teatasting=4, dolphins=15)

plot_dens <- function(expt){
plot(0:n_t[expt],
     dhyper(0:n_t[expt], m=allones[expt],
            n=allzeroes[expt], k=n_t[expt]),
      type="h", xlab="Test statistic", ylab="Probability",
      main=expt)
	    }

par(mfrow=c(2,1))
plot_dens("teatasting")
plot_dens("dolphins")
par(mfrow=c(1,1))
```
As $n$ grows -- and $\bar{Z}$, $\bar{y}$ stay away from 0 or 1 -- the histogram tends to Normality.

###  Expected value of random variables {.build}

For Normal-theory approximate *p*-values, the data and the strict null are combined to figure $\EE[\bm{Z}^{\top}\bm{y}]$ and $\var \bm{Z}^{\top}\bm{y}$.  Here we consider only the first.

- In general, $\EE(aX + bY) = a\EE(X) + b\EE(Y)$.  (*Linearity* of expected value.  $X$, $Y$ don't have to be independent.)
- By def., $\bm{Z}^{\top}\bm{y} = \sum_{i=1}^n Z_i y_i$. So $\EE[\bm{Z}^{\top}\bm{y}] = \sum_{i=1}^n y_i \PP(Z_i=1)$. ($y$ isn't random, $Z$ is.)
- Under complete randomization, $\PP(Z_i=1) = n_1/n$.
- In consequence, $\EE[\bm{Z}^{\top}\bm{y}] = \sum_{i=1}^n y_i \cdot (n_1/n) = (n_1/n)\sum_{i=1}^n y_i = n_1\bar{y}$.
- Similar algebra gives $\EE[t(\bm{Z}, \bm{y})]$, e.g. $t(\bm{Z}, \bm{y}) = \bm{Z}^{\top}\bm{y}$. (But not all $t()$.  If $t(\bm{Z}, \bm{y}) = f(\bm{Z}, \bm{y})g(\bm{Z}, \bm{y})$, linearity does **not** give $\EE[t(\bm{Z}, \bm{y})] = \EE[f(\bm{Z}, \bm{y})]\EE[g(\bm{Z}, \bm{y})]$ -- one factor has to be nonrandom.)

\note{Now continue normal theory part of exercises}

### References

\fontsize{8pt}{8pt}\selectfont
<div id="refs"></div>