Adding week 13 material

eveskew · Apr 19, 2020 · 106e9e7 · 106e9e7
1 parent 095a5f2
commit 106e9e7
Show file tree

Hide file tree

Showing 5 changed files with 374 additions and 0 deletions.
diff --git a/data/eagles.csv b/data/eagles.csv
@@ -0,0 +1,9 @@
+"successes","total_attempts","pirate_size","pirate_age","victim_size"
+17,24,"large","adult","large"
+29,29,"large","adult","small"
+17,27,"large","immature","large"
+20,20,"large","immature","small"
+1,12,"small","adult","large"
+15,16,"small","adult","small"
+0,28,"small","immature","large"
+1,4,"small","immature","small"
diff --git a/homework/hw_week_13.Rmd b/homework/hw_week_13.Rmd
@@ -0,0 +1,54 @@
+---
+title: "EEEB UN3005/GR5005  \nHomework - Week 13 - Due 30 Apr 2020"
+author: "USE THE NUMERIC PORTION OF YOUR UNI HERE"
+output: pdf_document
+fontsize: 12pt
+---
+
+```{r setup, include = FALSE}
+
+knitr::opts_chunk$set(echo = TRUE)
+
+library(rethinking)
+```
+
+
+**Homework Instructions:** Complete this assignment by writing code in the code chunks provided. If required, provide written explanations **below** the relevant code chunks. Replace "USE THE NUMERIC PORTION OF YOUR UNI HERE" in the document header with the numbers appearing in your UNI. When complete, knit this document within RStudio to generate a PDF file. Please review the resulting PDF to ensure that all content relevant for grading (i.e., code, code output, and written explanations) appears in the document. Rename your PDF document according to the following format: hw_week_13_UNInumbers.pdf. Upload this final homework document to CourseWorks by 5 pm on the due date. 
+
+
+This week's homework problems will ask you to think all the way back to your week 04 homework assignment. To briefly summarize, you were asked to imagine studying a bacterial pathogen that infects small mammals. Hypothetical pilot research efforts found 9 infected individuals out of 20 animals sampled in total.
+
+All the problems in this homework assignment will ask you to work with this data scenario. The primary challenge should not be the specification and fitting of the models; they'll each contain only one parameter, the intercept value. Rather, I'm mainly asking you to think about how to represent the data in different formats that are all valid expressions of binomial data. 
+
+
+## Problem 1 (4 points)
+
+First, represent the data scenario (9 infected animals out of 20 animals sampled) using an *aggregated binomial* data format. For this problem, your data should be contained in a data frame with only one row and two columns. One column should contain the number of infected individuals. The other column should contain the total number of samples.
+
+Use `map()` to fit a binomial generalized linear model to this data (an intercept-only model with no predictor variables). Use a prior of `dnorm(0, 10)` for the intercept parameter, and explicitly specify a start value of -5 for the intercept parameter. After fitting, use `precis()` to report the 97% PI of the fit intercept parameter. Further, use posterior samples from the model and `dens()` to visualize the posterior distribution of the intercept parameter **and** its implied probability of success (i.e., probability of infection) values.
+
+```{r}
+
+```
+
+
+## Problem 2 (3 points)
+
+Now refit the same model, but this time construct your data such that each row represents a single binomial trial. In other words, this is disaggregated binomial data, leading to what was termed in lecture *logistic regression*. For this data format, you will need a data frame with only one column. Since each row will represent a single binomial trial, the `size` argument within your model's `dbinom()` call will simply be equal to 1. After fitting the model, report the 97% PI of the fit intercept parameter to confirm you get identical posterior inference to Problem 1.
+
+```{r}
+
+```
+
+
+## Problem 3 (3 points)
+
+Refit the same model again (using either data format), but this time use `map2stan()`, specifying four MCMC chains. In addition, use a prior of `dunif(-10, 0)` for your model's intercept parameter.
+
+After fitting the model, use `precis()` to report the 97% HPDI of the fit intercept parameter. Further, use posterior samples from the model and `dens()` to visualize the intercept parameter **and** its implied probability of success (i.e., probability of infection) values.
+
+Explain how this new choice of prior changed your posterior inference about the probability of infection. Can you explain why your density plot has the particular shape it has?
+
+```{r}
+
+```
diff --git a/lab/lab_week_13.Rmd b/lab/lab_week_13.Rmd
@@ -0,0 +1,50 @@
+---
+title: "EEEB UN3005/GR5005  \nLab - Week 13 - 20 and 22 April 2020"
+author: "USE YOUR NAME HERE"
+output: pdf_document
+fontsize: 12pt
+---
+
+```{r setup, include = FALSE}
+
+knitr::opts_chunk$set(echo = TRUE)
+
+library(rethinking)
+```
+
+
+**Updated Lab Instructions Due to Remote Teaching:** Complete this assignment by writing code in the code chunks provided. If required, provide written explanations below the relevant code chunks. When complete, knit this document within RStudio to generate a PDF, and upload that document to CourseWorks by 5 pm on 23 April.
+
+
+# Binomial Regression
+
+
+## Exercise 1: Fitting a Binomial Generalized Linear Model
+
+On the class CourseWorks page, you'll find a dataset called `eagles.csv`. This dataset summarizes observations of salmon foraging interactions by bald eagles in the western United States. More specifically, when one eagle (the victim) is trying to feed, another eagle (the pirate) may try to steal the catch. The `eagles.csv` data records the number of successful thieving events by the pirate eagle (`successes`) given a total number of thieving attempts (`total_attempts`). In addition, there is information on the specifics of the eagle interaction including size of the pirating eagle, age of the pirating eagle, and size of the victim eagle.
+
+Import the `eagles.csv` data, and fit a binomial generalized linear with `successes` as the outcome variable and `pirate_size` and `victim_size` as predictor variables. Note, you'll need to create dummy predictor variables for `pirate_size` and `victim_size` for use in your model. To do so, 1) make a new variable called `pirate_large` that is equal to 1 when the pirate eagle is large, and 2) make a new variable called `victim_large` that is equal to 1 when the victim eagle is large. Fit your model using `map()` and priors of `dnorm(0, 10)` for the intercept parameter and `dnorm(0, 5)` for the beta coefficients. After fitting your model, report the 97% PI for all model parameters.
+
+```{r}
+
+```
+
+
+## Exercise 2: Visualizing the Model Intercept and the Implied Probability of Success
+
+Extract 10,000 posterior samples from your fit model and visualize the intercept parameter samples using `dens()`. What type of eagle interaction (i.e., combination of large/small pirate and victim) does the intercept value correspond to?
+
+Note, however, that the intercept parameter samples are not on the probability scale. They do not lie between 0 and 1. Can you report the 97% HPDI for the probability of success (i.e., probability of thieving) values that these intercept parameter samples imply? As a hint, this will involve reversing the link function that is used in fitting the generalized linear model. Similarly, visualize these implied probability of success values using `dens()`.
+
+```{r}
+
+```
+
+
+## Exercise 3: Plotting Implied Probability of Success Values for All Pirate-Victim Combinations
+
+Now extend the ideas you just implemented in Exercise 2 to use `dens()` to plot implied probability of success values for all pirate-victim size combinations. Rather than generating these implied values yourself manually, you can define counterfactual datasets and let `link()` generate the implied values for you. After you've generated implied probability of success values for all pirate-victim combinations, the plotting should be straightforward.
+
+```{r}
+
+```
diff --git a/lecture/lecture_week_13.R b/lecture/lecture_week_13.R
@@ -0,0 +1,261 @@
+
+
+library(rethinking)
+library(dplyr)
+
+#==========================================================
+
+
+# Generate a vector of probability values
+probabilities <- seq(from = 0, to = 1, by = 0.01)
+probabilities
+
+# Compute the odds for each of these probabilities and 
+# plot the relationship
+odds <- probabilities / (1 - probabilities)
+odds
+plot(probabilities, odds, type = "n")
+lines(probabilities, odds)
+
+# Compute the log-odds for each of these probabilities 
+# and plot the relationship
+log_odds <- log(odds)
+log_odds
+plot(probabilities, log_odds, type = "n")
+lines(probabilities, log_odds)
+
+
+# Convert some log-odds values back to probabilities
+# using the logistic function
+log_odds_seq <- seq(from = -50, to = 50, by = 1)
+log_odds_seq
+probs_seq <- logistic(log_odds_seq)
+plot(log_odds_seq, probs_seq, type = "n")
+lines(log_odds_seq, probs_seq)
+
+#==========================================================
+
+
+# Chimpanzee prosociality binomial models
+
+# Import the chimpanzee data
+data(chimpanzees)
+d <- chimpanzees
+
+# Fit an intercept-only binomial GLM
+m10.1 <- map(
+  data = d,
+  alist(
+    pulled_left ~ dbinom(1, p),
+    logit(p) <- a,
+    a ~ dnorm(0, 10)
+  )
+)
+
+precis(m10.1, prob = 0.97)
+
+# Consider the fit model parameter
+exp(0.32) # on the odds scale
+logistic(0.32) # on the probability scale
+
+# Fit a more complex binomial GLM considering the effects
+# of prosocial on left and condition treatments
+m10.3 <- map(
+  data = d,
+  alist(
+    pulled_left ~ dbinom(1, p),
+    logit(p) <- a + bP*prosoc_left + bPC*prosoc_left*condition,
+    a ~ dnorm(0, 10) ,
+    bP ~ dnorm(0, 10) ,
+    bPC ~ dnorm(0, 10)
+  )
+)
+
+precis(m10.3, prob = 0.97)
+
+
+# Visualize predictions from model m10.3
+
+# Dummy data for predictions across treatments
+d.pred <- data.frame(
+  prosoc_left = c(0, 1, 0, 1), # right/left/right/left
+  condition = c(0, 0, 1, 1) # control/control/partner/partner
+)
+
+# Build predictions for probability of success using
+# "link()"
+preds.p <- link(m10.3, data = d.pred)
+
+# Summarize the probability prediction values
+preds.p.mean <- apply(preds.p, 2, mean)
+preds.p.PI <- apply(preds.p, 2, PI, prob = 0.9)
+
+# Generate an empty plot frame with good axes
+plot(0, 0, type = "n", xaxt = "n",
+     xlab = "prosoc_left/condition",
+     ylab = "proportion pulled left",
+     xlim = c(1, 4), ylim = c(0, 1))
+axis(1, at = 1:4, labels = c("0/0", "1/0", "0/1", "1/1"))
+
+# Plot raw data, one trend for each of 7 individual
+# chimpanzees using "by()"
+p <- by(d$pulled_left,
+        list(d$prosoc_left, d$condition, d$actor), mean)
+for (chimp in 1:7)
+  lines(1:4, as.vector(p[, , chimp]), 
+        col = rangi2, lwd = 1.5)
+
+# Superimpose posterior predictions
+lines(1:4, preds.p.mean)
+shade(preds.p.PI, 1:4)
+
+
+# Replicate model m10.3 using aggregated binomial data
+
+# Generate data in an aggregated form
+d.aggregated <- d %>%
+  group_by(prosoc_left, condition) %>%
+  summarize(
+    pulled_left_aggregated = sum(pulled_left),
+    n_trials = n()
+  ) %>%
+  data.frame()
+
+d.aggregated
+
+# Fit the model using aggregated data
+m10.5 <- map(
+  data = d.aggregated,
+  alist(
+    pulled_left_aggregated ~ dbinom(n_trials, p),
+    logit(p) <- a + bP*prosoc_left + bPC*prosoc_left*condition,
+    a ~ dnorm(0, 10),
+    bP ~ dnorm(0, 10),
+    bPC ~ dnorm(0, 10)
+  )
+)
+
+# Compare model output
+precis(m10.5, prob = 0.97)
+precis(m10.3, prob = 0.97)
+
+#==========================================================
+
+
+# Snow goose color binomial models
+
+# Load in hypothetical snow goose data in aggregated
+# binomial format
+geese <- data.frame(
+  blue_geese = c(215, 84, 7),
+  total_geese = c(500, 300, 25),
+  study_site = c("Site A", "Site B", "Site C")
+)
+
+# Add on a variable indicating the proportion of blue
+# morphs at each study site
+geese$prop_blue <- geese$blue_geese/geese$total_geese
+
+geese
+
+# Plot the raw proportions of blue morphs
+plot(prop_blue ~ study_site, data = geese, 
+     ylim = c(0, 0.5))
+
+# Generate dummy variables for site affiliation
+geese$site_B <- ifelse(geese$study_site == "Site B", 1, 0)
+geese$site_C <- ifelse(geese$study_site == "Site C", 1, 0)
+
+# Fit a binomial GLM using site to predict the 
+# probability of a goose being the blue morph
+goose.model <- map(
+  data = geese,
+  alist(
+    blue_geese ~ dbinom(size = total_geese, prob = p),
+    logit(p) ~ a + b_site_B*site_B + b_site_C*site_C,
+    a ~ dnorm(0, 10),
+    b_site_B ~ dnorm(0, 10),
+    b_site_C ~ dnorm(0, 10)
+  )
+)
+
+precis(goose.model, prob = 0.97)
+
+
+# Visualize model inference
+
+# Extract samples from the model posterior
+goose.post <- extract.samples(goose.model, n = 10000)
+
+# Show the posterior distribution of the intercept
+# parameter (on the log-odds scale), which corresponds
+# to study site A
+dens(
+  goose.post$a, 
+  xlab = "Intercept parameter (log-odds scale)"
+)
+# Show the posterior distribution of the implied 
+# probability of blue morphs at study site A
+dens(
+  logistic(goose.post$a),
+  xlab = "Implied probability of a blue goose"
+)
+
+
+# Plot the log-odds of a goose being blue by plotting
+# posterior parameter samples
+
+# site A (intercept or reference category)
+dens(goose.post$a, xlim = c(-3, 1),
+     xlab = "Log-odds of a blue goose")
+# site B
+dens(goose.post$a + goose.post$b_site_B, 
+     add = TRUE, col = "blue")
+# site C
+dens(goose.post$a + goose.post$b_site_C, 
+     add = TRUE, col = "green")
+
+# Plot the implied probability of a goose being blue by 
+# plotting posterior parameter samples transformed through
+# the logistic function
+
+# site A (intercept or reference category)
+dens(logistic(goose.post$a), xlim = c(0, 0.5),
+     xlab = "Implied probability of a blue goose")
+# site B
+dens(logistic(goose.post$a + goose.post$b_site_B), 
+     add = TRUE, col = "blue")
+# site C
+dens(logistic(goose.post$a + goose.post$b_site_C), 
+     add = TRUE, col = "green")
+
+
+# You can do this same type of prediction plot 
+# using "link()"
+
+counterfactual.siteA <- data.frame(site_B = 0, site_C = 0)
+counterfactual.siteB <- data.frame(site_B = 1, site_C = 0)
+counterfactual.siteC <- data.frame(site_B = 0, site_C = 1)
+
+probs.siteA <- link(
+  goose.model, 
+  data = counterfactual.siteA, 
+  n = 10000
+)
+probs.siteB <- link(
+  goose.model, 
+  data = counterfactual.siteB, 
+  n = 10000
+)
+probs.siteC <- link(
+  goose.model, 
+  data = counterfactual.siteC,
+  n = 10000
+)
+
+dens(probs.siteA, xlim = c(0, 0.5),
+     xlab = "Implied probability of a blue goose")
+dens(probs.siteB, 
+     add = TRUE, col = "blue")
+dens(probs.siteC, 
+     add = TRUE, col = "green")
diff --git a/lecture/lecture_week_13.pptx b/lecture/lecture_week_13.pptx