-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmain.Rmd
265 lines (225 loc) · 6.84 KB
/
main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
title: "Introduction to R"
author: "Matthew Thomas"
output:
beamer_presentation:
pandoc_args: "--highlight=style/my.theme"
includes:
in_header: style/twocol.tex
theme: metropolis
---
```{r include=FALSE}
library(tidyverse)
set.seed(100)
knitr::opts_chunk$set(comment = NA)
# Adjust summary function so that results fit on the page
summary2 <- summary
summary <- function(object, ...){
return(summary2(object, ...)$coef)
}
# Change ggplot2 theme
theme_set(theme_bw())
```
## Why R?
* R is free, open source, and incredibly popular
* There is a large (and welcoming) community of R programmers online who can help troubleshoot code and answer questions
* The language is incredibly well (and consistently) documented
* There are thousands of packages which implement statistical estimators and other use cases.
# Defining variables/basic data types
## Vectors and Assignment
The function `c()` takes vectors and creates a new longer vector. The assignment operator `<-` is a shortcut for the `assign()` function.
```{r}
x <- c(1,2,4,6,10:13)
assign("y",c(1,2,4,6,10:13))
```
```{r}
x
y
```
## Operators
```{r}
x/y # Operators on vectors apply element-wise
(1:2) * (1:8) # Vectors will repeat if necessary
(1:6) > (6:1) # Logical operators: <,<=,==,>=,>,!=
!(1:6) > (6:1) # Reverse logic with !
```
## Matrices
A matrix is a vector with a dimension attribute. Matrices are filled column by column unless specified.
```{r}
(mat <- matrix(data = x, ncol = 2))
```
## Sub-setting Matrices
You can subset a matrix using row,col indexing.
```{r}
mat[1,] # First row of matrix
mat[,2] # Second column of matrix
mat[1,2] # Second element of first row
```
## Warning about One Dimensional Matrices
An nx1 matrix and a vector are not the same thing. For example, a nx1 matrix will not replicate if necessary.
```{r error=TRUE}
matrix(1:2) * matrix(1:8)
```
## Defining Functions
Functions are objects in R that can be applied to other objects. `c()`, `mean()`, and `sum()` are examples of built-in functions. You can also write your own functions.
```{r}
sumsq <- function(var){
return(sum(var^2))
}
```
## Calling Functions
These functions can be called just as any built-in function.
```{r}
sumsq(c(1,2))
```
The convenience operator `%>%` passes the preceding object to the first argument of any function.
```{r}
c(1,2) %>% sumsq()
```
## Lists
Lists can contain any object types.
```{r}
z <- list( "y" = y,
"istwo" = y^2 == y*2,
"p" = runif(8)*(1:4)/y^2 )
```
You can reference items from a list using brackets or dollar sign
```{r}
z["y"] # Returns a single element list
z$istwo # Returns a vector
```
# Dealing with data frames
## Creating a data frame
You can make a data frame using vectors or a list. Data frames are special lists with elements of the same length.
```{r}
(df1 <- data.frame(z))
```
## Adding to data frames
You can reference and add to a data frame just as you can with any other list. However, data frames will repeat elements if necessary to enforce the length requirement.
```{r}
df1$prod <- LETTERS[1:4]
head(df1)
```
## Matrix-like properties of data frames
Due to the length requirement, data frames have limited matrix like properties. You can index a data frame just like a matrix.
```{r}
df1[1,] # First row of data frame
```
You can even apply most operators to **numeric** data frames. Linear algebra operators do not work on data frames.
```{r}
df1[1,1:3]+1 # Have to exclude prod
```
## Manipulating data frames
You can manipulate data using the traditional list interface
```{r}
df1$ly <- log(df1$y)
```
The `tidyverse` package has introduced another way to do this using the `mutate()` function
```{r}
df1 <- df1 %>% mutate(ly2 = log(y))
head(df1,4)
```
# Regression
## Running a regression
If you just want to run a regression in R, often do not need to manipulate data. Regressions in R allow you to adjust variables using ``formulas''. Suppose we want to estimate the following model:
$$
\log(y) = \beta_0 + \beta_1 \log(p) + \beta_2 prodB + \beta_3 prodC + \beta_4 prodD
$$
\small
```{r}
lm(log(y) ~ log(p) + prod, data = df1) %>% summary()
```
\normalsize
## Interaction terms
You can add interaction terms by using a `:` between two variable names.
\small
```{r}
lm(log(y) ~ log(p) + log(p):prod, data = df1) %>% summary()
```
\normalsize
## Removing the constant
You can suppress the constant by adding `-1` to the formula. Note that it automatically adds the dummy for product A back into the regression.
\small
```{r}
lm(log(y) ~ log(p) + prod - 1, data = df1) %>% summary()
```
\normalsize
## Polynomials
R does not allow arbitrary binary operators inside of an equation.
\small
```{r}
lm(log(y) ~ p + p^2, data = df1) %>% summary()
```
\normalsize
To run a polynomial fit, you need to use the `poly` function
\small
```{r}
lm(log(y) ~ poly(p,2), data = df1) %>% summary()
```
\normalsize
## Overriding
But what if you just want the square term? For that, you need to override using the inhibit function, `I()`.
\small
```{r}
lm(log(y) ~ p^2, data = df1) %>% summary()
lm(log(y) ~ I(p^2), data = df1) %>% summary()
```
\normalsize
# Visualization
## Builtin graphics
There are several basic builtin plot commands builtin to R.
\begincols
\begincol{.48\textwidth}
```{r}
plot(df1$y ~ df1$p, cex=2)
```
\endcol
\begincol{.48\textwidth}
```{r}
hist(df1$p, breaks = 8)
```
\endcol
\endcols
They are not very pretty, but they are very easy to use.
## ggplot2 graphics
```{r fig.height = 4.5}
ggplot(data = df1, aes(x=p, y=y, col=prod)) +
geom_point(size=2) +
geom_smooth(method="lm", col="blue", size=1) +
coord_cartesian(xlim=c(0,0.3), ylim=c(0,13)) +
labs(title="Demand", y="Quantity", x="Price")
```
# Appendix
## Converting a data frame to a matrix
Because a matrix can contain categorical variables and strings, it is not always possible to directly convert a data frame to a matrix. An all numeric data frame can be converted by simply using `as.matrix()`
```{r}
dfa <- data.frame(a=1:5,b=77:81,c=log(22:18))
dfb <- data.frame(a=letters[1:5],b=77:81,c=log(22:18))
```
\begincols
\begincol{.48\textwidth}
```{r}
as.matrix(dfa)
```
\endcol
\begincol{.48\textwidth}
```{r}
as.matrix(dfb)
```
\endcol
\endcols
## Converting a data frame to a matrix
In order to properly convert a data frame with strings or factors into a numeric matrix, we need to use `model.matrix()`. This is what R uses when it runs regressions.
```{r}
model.matrix(~a+b+c-1,dfb)
```
## Linear Algebra and apply
The `apply()` function applies some function across rows `(MARGIN=1)` or columns `(MARGIN=2)` of a matrix.
```{r}
apply(X=mat, MARGIN=1, FUN=sumsq)
```
The operators `%*%` and `%^%` do matrix multiplication and exponentiation. The function `t()` transposes. If you can accomplish a task with linear algebra, it is generally faster than `apply()`.
```{r}
c(mat^2 %*% c(1,1))
```
for example is more than twice as fast for a large matrix.