Skip to content

Commit

Permalink
Adding week 11 material
Browse files Browse the repository at this point in the history
  • Loading branch information
eveskew committed Apr 6, 2020
1 parent 3928e84 commit 0f8535e
Show file tree
Hide file tree
Showing 4 changed files with 364 additions and 0 deletions.
54 changes: 54 additions & 0 deletions homework/hw_week_11.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "EEEB UN3005/GR5005 \nHomework - Week 11 - Due 16 Apr 2020"
author: "USE THE NUMERIC PORTION OF YOUR UNI HERE"
output: pdf_document
fontsize: 12pt
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
library(dplyr)
library(ggplot2)
```


**Homework Instructions:** Complete this assignment by writing code in the code chunks provided. If required, provide written explanations **below** the relevant code chunks. Replace "USE THE NUMERIC PORTION OF YOUR UNI HERE" in the document header with the numbers appearing in your UNI. When complete, knit this document within RStudio to generate a PDF file. Please review the resulting PDF to ensure that all content relevant for grading (i.e., code, code output, and written explanations) appears in the document. Rename your PDF document according to the following format: hw_week_11_UNInumbers.pdf. Upload this final homework document to CourseWorks by 5 pm on the due date.


All the following homework problems will draw on the country-level dataset (`rugged`) that was discussed in the *Statistical Rethinking* book and lecture. In particular, we'll be interested in an African island nation, Seychelles, and how inclusion of data from this one country might affect our statistical inference.


## Problem 1 (3 points)

Following the lecture code, import the `rugged` dataset, create a logged version of the year 2000 GDP variable (for use as our outcome variable), and subset the data down to only those countries that actually have GDP data.

Now for something new: because we are interested in Seychelles, we'd like to visualize where Seychelles stands in relation to other African (and non-African countries). Therefore, using a method of your choice, create a new variable in the `rugged` data frame called `geographic_affiliation`. `geographic_affiliation` should have the value of "non-African nation" anywhere `cont_africa == 0`. Similarly, `geographic_affiliation` should have the value of "African nation" anywhere `cont_africa == 1` EXCEPT when `country == "Seychelles"`. There, `geographic_affiliation` should have a value of "Seychelles".

Using the `ggplot()` function, visualize the relationship between `rugged` (x-axis) and log GDP (y-axis) using a scatterplot. Assign the color of the points to `geographic_affiliation`. You should end up with a scatterplot featuring points of three different colors, corresponding to "African nation", "non-African nation", and "Seychelles".

Using the plot to assist in your interpretation, where does the GDP of Seychelles lie relative to most other African countries? Where does the terrain ruggedness value of Seychelles lie relative to most other African countries?

```{r}
```


## Problem 2 (4 points)

Now replicate the interaction model as given in lecture (m7.5b) using a dataset that excludes Seychelles. In addition, re-fit model m7.5b as in lecture, using the full dataset (by "full dataset" I mean the `rugged` dataset with all countries that have GDP data). Compare these two models using `precis()` to show the 97% PIs of model parameters. Interpret the change you see in the bAR parameter (the interaction term) in your new fit model relative to the parameter estimate derived from m7.5b.

```{r}
```


## Problem 3 (3 points)

Using the lecture code as a guide, plot model-based predictions for both m7.5b and your new model that was fit excluding Seychelles. For a given model, you can choose to show predictions for the ruggedness effect inside and outside of Africa in two separate panels or together on one plot. Both methods were demonstrated in lecture.

```{r}
```
45 changes: 45 additions & 0 deletions lab/lab_week_11.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "EEEB UN3005/GR5005 \nLab - Week 11 - 06 and 08 April 2020"
author: "USE YOUR NAME HERE"
output: pdf_document
fontsize: 12pt
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
```


**Updated Lab Instructions Due to Remote Teaching:** Complete this assignment by writing code in the code chunks provided. If required, provide written explanations below the relevant code chunks. When complete, knit this document within RStudio to generate a PDF, and upload that document to CourseWorks by 5 pm on 09 April.


# Interactions


## Exercise 1: Multiple Regression with an Interaction Effect

Referencing the *Statistical Rethinking* text (see page 230 and relevant preceding code), reconstruct and fit the model `m7.9`. This model uses the `tulips` dataset. It models `blooms` as the outcome variable of interest, with centered versions of `water` and `shade`, as well as the interaction between them, as continuous predictor variables.

After fitting the model, use `precis()` to display the 97% PIs for all model parameter posteriors. Check to make sure your results generally align with what's shown in the book.

Finally, how do you interpret the intercept estimate in the context of this model? In other words, what portion of the data does the intercept estimate describe?

```{r}
```


## Exercise 2: Triptych Plots for Both Predictor Variables

In the *Statistical Rethinking* book, Figure 7.7 (page 234), you'll see a series of "triptych" plots. Pay particular attention to the bottom row of the panel, which shows model-based predictions (from model m7.9) for the effect of `shade.c` on `blooms`. Because the model includes multiple predictor variables, it makes sense to plot this relationship across multiple values of the other predictor (which is `water.c`, hence the series of three plots on the bottom row that have differing values of `water.c`). And because the model includes an interaction effect, the relationship between `shade.c` and `blooms` varies across those three plots.

To fully visualize the predictions from model m7.9, create two triptych plots. The first triptych plot should show the effect of `water.c` on `blooms`, plotted for three different values of `shade.c`. The second triptych plot should show the effect of `shade.c` on `blooms`, plotted for three different values of `water.c` (i.e., this second triptych plot should replicate exactly what you see in the bottom row of the book's Figure 7.7). Clearly, modifying the book's code will help you in generating this output. Feel free to visualize the 97% interval of the mean using lines (as in the book code) or using a shaded interval (as has been demonstrated in multiple places throughout the book and lecture code using the `shade()` function). Also note, you can plot these two triptych plots together neatly with some modification to your plotting window. See the book's R code 7.28 box for a hint as to how...

Using the plots to help with your interpretation, at which value(s) of shade is the effect of water most extreme? At which value(s) of water is the effect of shade most extreme?

```{r}
```
265 changes: 265 additions & 0 deletions lecture/lecture_week_11.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@


library(tidyverse)
library(rethinking)

#==========================================================


# Import and prep the country data

data(rugged)
d <- rugged

# Make log version of outcome (GDP in year 2000)
d$log_gdp <- log(d$rgdppc_2000)

# Subset down to only countries with GDP data
dd <- d[complete.cases(d$rgdppc_2000), ]

# Split into two data frames for demonstration purposes,
# one with only African countries and one with only
# non-African nations
d.A1 <- filter(dd, cont_africa == 1) # Africa
d.A0 <- filter(dd, cont_africa == 0) # not Africa

#==========================================================


# Fit models for country subsets separately
# (for later comparison with model fit on all data)

# African nations

m7.1 <- map(
data = d.A1,
alist(
log_gdp ~ dnorm(mu, sigma),
mu <- a + bR*rugged,
a ~ dnorm(8, 100),
bR ~ dnorm(0, 1),
sigma ~ dunif(0, 10)
)
)

# non-African nations

m7.2 <- map(
data = d.A0,
alist(
log_gdp ~ dnorm(mu, sigma),
mu <- a + bR*rugged,
a ~ dnorm(8, 100),
bR ~ dnorm(0, 1),
sigma ~ dunif(0, 10)
)
)

#==========================================================


# Fit a linear model with all data, using terrain
# ruggedness to predict GDP

m7.3 <- map(
data = dd,
alist(
log_gdp ~ dnorm(mu, sigma),
mu <- a + bR*rugged,
a ~ dnorm(8, 100),
bR ~ dnorm(0, 1),
sigma ~ dunif(0, 10)
)
)

precis(m7.3, prob = 0.97)


# Plot model predictions

# Define a sequence of ruggedness values to plot
# predictions over
summary(dd$rugged)
rugged.seq <- seq(from = -1, to = 8, by = 0.5)

# Package this sequence into a data frame for use with
# the "link()" function
counterfactual <- data.frame(rugged = rugged.seq)

# Generate predicted values of the mean trend and summaries
# of those values
mu <- link(m7.3, data = counterfactual)
mu.mean <- apply(mu, 2, mean)
mu.PI <- apply(mu, 2, PI, prob = 0.97)

plot(log(rgdppc_2000) ~ rugged, data = dd, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = alpha("red4", 0.2)
)
mtext("Linear regression", 3)
lines(rugged.seq, mu.mean, col = "red4")
shade(mu.PI, rugged.seq, col = col.alpha("red4", 0.3))


# Please note that "link()" is just a shortcut for
# manually re-constructing the linear model of the mean
# parameter and plugging in samples from the model
# posterior. So this will give the identical plot:

# Extract posterior samples
post.m7.3 <- extract.samples(m7.3, n = 1000)

# Generate (manually) predicted values of the mean using
# the linear model formula and posterior parameter
# estimates
mu.alt <- sapply(rugged.seq, function(x)
post.m7.3$a + post.m7.3$bR*x
)
mu.alt.mean <- apply(mu.alt, 2, mean)
mu.alt.PI <- apply(mu.alt, 2, PI, prob = 0.97)

plot(log(rgdppc_2000) ~ rugged, data = dd, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = alpha("red4", 0.2)
)
mtext("Linear regression", 3)
lines(rugged.seq, mu.alt.mean, col = "red4")
shade(mu.alt.PI, rugged.seq, col = col.alpha("red4", 0.3))

#==========================================================


# Fit a multiple regression model using terrain ruggedness
# and continent identity to predict GDP

m7.4 <- map(
data = dd,
alist(
log_gdp ~ dnorm(mu, sigma),
mu <- a + bR*rugged + bA*cont_africa,
a ~ dnorm(8, 100),
bR ~ dnorm(0, 1),
bA ~ dnorm(0, 1),
sigma ~ dunif(0, 10)
)
)

precis(m7.4, prob = 0.97)


# Plot model predictions

# Generate counterfactual data frames. One will represent
# data from Africa and have a range of ruggedness values.
# The other will represent data outside of Africa and have
# the same range of ruggedness values.
counterfactual.Africa <-
data.frame(cont_africa = 1, rugged = rugged.seq)
counterfactual.NotAfrica <-
data.frame(cont_africa = 0, rugged = rugged.seq)

# Use each of these counterfactual data frames in turn to
# generate predictions for the mean trend both inside
# and outside of Africa
mu.Africa <- link(m7.4, data = counterfactual.Africa)
mu.Africa.mean <- apply(mu.Africa, 2, mean)
mu.Africa.PI <- apply(mu.Africa, 2, PI, prob = 0.97)

mu.NotAfrica <- link(m7.4, data = counterfactual.NotAfrica)
mu.NotAfrica.mean <- apply(mu.NotAfrica, 2, mean)
mu.NotAfrica.PI <- apply(mu.NotAfrica, 2, PI, prob = 0.97)

plot(log(rgdppc_2000) ~ rugged, data = dd, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = ifelse(dd$cont_africa == 1, rangi2, "black")
)
mtext("Multiple regression, no interaction", 3)
lines(rugged.seq, mu.Africa.mean, col = rangi2)
shade(mu.Africa.PI, rugged.seq,
col = col.alpha(rangi2, 0.3))
lines(rugged.seq, mu.NotAfrica.mean)
shade(mu.NotAfrica.PI, rugged.seq)

#==========================================================


# Fit a multiple regression with an interaction term

m7.5b <- map(
data = dd,
alist(
log_gdp ~ dnorm(mu, sigma),
mu <- a + bR*rugged + bA*cont_africa + bAR*rugged*cont_africa,
a ~ dnorm(8, 100),
bA ~ dnorm(0, 1),
bR ~ dnorm(0, 1),
bAR ~ dnorm(0, 1),
sigma ~ dunif(0, 10)
)
)

precis(m7.5b, prob = 0.97)


# Generate model-based predictions

mu.Africa <- link(m7.5b, data = counterfactual.Africa)
mu.Africa.mean <- apply(mu.Africa, 2, mean)
mu.Africa.PI <- apply(mu.Africa, 2, PI, prob = 0.97)

mu.NotAfrica <- link(m7.5b, data = counterfactual.NotAfrica)
mu.NotAfrica.mean <- apply(mu.NotAfrica, 2, mean)
mu.NotAfrica.PI <- apply(mu.NotAfrica, 2, PI, prob = 0.97)


# Generate a prediction plot following the book

par(mfrow = c(1, 2))

# plot African nations with regression
plot(log(rgdppc_2000) ~ rugged, data = d.A1, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = rangi2
)
mtext("African nations", 3)
lines(rugged.seq, mu.Africa.mean, col = rangi2)
shade(mu.Africa.PI, rugged.seq,
col = col.alpha(rangi2, 0.3))

# plot non-African nations with regression
plot(log(rgdppc_2000) ~ rugged, data = d.A0, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = "black"
)
mtext("Non-African nations", 3)
lines(rugged.seq, mu.NotAfrica.mean)
shade(mu.NotAfrica.PI, rugged.seq)


# Generate an alternate prediction plot

par(mfrow = c(1, 1))

plot(log(rgdppc_2000) ~ rugged, data = dd, pch = 19,
xlab = "Terrain Ruggedness Index",
ylab = "log(GDP year 200)",
col = ifelse(dd$cont_africa == 1, rangi2, "black")
)
mtext("Multiple regression, with interaction", 3)
lines(rugged.seq, mu.Africa.mean, col = rangi2)
shade(mu.Africa.PI, rugged.seq,
col = col.alpha(rangi2, 0.3))
lines(rugged.seq, mu.NotAfrica.mean)
shade(mu.NotAfrica.PI, rugged.seq)


# Compare model output with models fit on data subsets

precis(m7.1)
precis(m7.2)
precis(m7.5b)
Binary file added lecture/lecture_week_11.pptx
Binary file not shown.

0 comments on commit 0f8535e

Please sign in to comment.