-
Notifications
You must be signed in to change notification settings - Fork 3
/
hw_week_11.Rmd
54 lines (29 loc) · 3.57 KB
/
hw_week_11.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
title: "EEEB UN3005/GR5005 \nHomework - Week 11 - Due 16 Apr 2020"
author: "USE THE NUMERIC PORTION OF YOUR UNI HERE"
output: pdf_document
fontsize: 12pt
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(rethinking)
library(dplyr)
library(ggplot2)
```
**Homework Instructions:** Complete this assignment by writing code in the code chunks provided. If required, provide written explanations **below** the relevant code chunks. Replace "USE THE NUMERIC PORTION OF YOUR UNI HERE" in the document header with the numbers appearing in your UNI. When complete, knit this document within RStudio to generate a PDF file. Please review the resulting PDF to ensure that all content relevant for grading (i.e., code, code output, and written explanations) appears in the document. Rename your PDF document according to the following format: hw_week_11_UNInumbers.pdf. Upload this final homework document to CourseWorks by 5 pm on the due date.
All the following homework problems will draw on the country-level dataset (`rugged`) that was discussed in the *Statistical Rethinking* book and lecture. In particular, we'll be interested in an African island nation, Seychelles, and how inclusion of data from this one country might affect our statistical inference.
## Problem 1 (3 points)
Following the lecture code, import the `rugged` dataset, create a logged version of the year 2000 GDP variable (for use as our outcome variable), and subset the data down to only those countries that actually have GDP data.
Now for something new: because we are interested in Seychelles, we'd like to visualize where Seychelles stands in relation to other African (and non-African countries). Therefore, using a method of your choice, create a new variable in the `rugged` data frame called `geographic_affiliation`. `geographic_affiliation` should have the value of "non-African nation" anywhere `cont_africa == 0`. Similarly, `geographic_affiliation` should have the value of "African nation" anywhere `cont_africa == 1` EXCEPT when `country == "Seychelles"`. There, `geographic_affiliation` should have a value of "Seychelles".
Using the `ggplot()` function, visualize the relationship between `rugged` (x-axis) and log GDP (y-axis) using a scatterplot. Assign the color of the points to `geographic_affiliation`. You should end up with a scatterplot featuring points of three different colors, corresponding to "African nation", "non-African nation", and "Seychelles".
Using the plot to assist in your interpretation, where does the GDP of Seychelles lie relative to most other African countries? Where does the terrain ruggedness value of Seychelles lie relative to most other African countries?
```{r}
```
## Problem 2 (4 points)
Now replicate the interaction model as given in lecture (m7.5b) using a dataset that excludes Seychelles. In addition, re-fit model m7.5b as in lecture, using the full dataset (by "full dataset" I mean the `rugged` dataset with all countries that have GDP data). Compare these two models using `precis()` to show the 97% PIs of model parameters. Interpret the change you see in the bAR parameter (the interaction term) in your new fit model relative to the parameter estimate derived from m7.5b.
```{r}
```
## Problem 3 (3 points)
Using the lecture code as a guide, plot model-based predictions for both m7.5b and your new model that was fit excluding Seychelles. For a given model, you can choose to show predictions for the ruggedness effect inside and outside of Africa in two separate panels or together on one plot. Both methods were demonstrated in lecture.
```{r}
```