-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path06-Collaboration.Rmd
360 lines (240 loc) · 11.3 KB
/
06-Collaboration.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
# Collaboration {#six}
In this final week of QuaRantine, we focus on communicating reproducible analyses with our peers. We will
- Write 'markdown' vignettes to describe and share our analyses
- Create and document functions that encapsulate common tasks or steps in a work flow
- Combine vignettes and documented functions into an _R_ package that can be easily shared with others
Instead of counting upward from the begining of our quarantine, we count down to the end.
## 5 Days (Monday) Zoom check-in
### Weekend review (5 minutes)
### Vignettes (25 minutes; Shawn)
Vignette preparation
- Create a directory `Week-06` in the working directory
```{r}
workdir = "workdir/Week-06"
if (!dir.exists(workdir)) {
dir.create(workdir, recursive = TRUE)
}
```
Create an _R_ markdown document
- `File` -> `New file` -> `R markdown...`
- Enter a `Title` and your name as `Author`
- Use `HTML` as default output
- _RStudio_ creates a template
- Save it (e.g., click the floppy disk icon) in the `Week-06` folder. Use a title such as `mtcars_regression.Rmd`.
Preview (knit) the markdown template
- Click the `knit` button
- Install the `knitr` package if button is missing
Customize the template
- Let's add a demonstration of `ggplot2` using the `mtcars` data set:
<pre>
---
title: "multiple regression plots"
author: "Shawn Matott"
date: "5/13/2020"
output: html_document
---
## Muliple Regression Plots
- Scatterplot by category with regression lines (using ggplot2)
```{r}`r ''`
library(ggplot2)
data("mtcars")
## convert cyl to a factor-level object
mtcars <-
mtcars %>%
mutate(
cyl = factor(
cyl,
levels = c(4, 6, 8),
labels = c("4 cyl", "6 cyl", "8 cyl")
)
)
ggplot(mtcars, aes(wt, mpg, color=cyl))+
geom_point()+
geom_smooth(method="lm")+
labs(title="Regression of MPG on Weight by # Cylinders",
x = "Weight",
y = "Miles per Gallon",
color = "Cylinders")
```
</pre>
Preview your edits
- Save your changes
- Click the `knit` button
After a bit of processing, the `Rmarkdown` is rendered as an `.html` page. Notice how the _R_ code block has been evaluated and resulting console output and plot are included in the `.html`. This is great because we can be sure that the vignette is actually working (e.g. no syntax errors or other coding problems).
There's a related package called `bookdown` that allows you to assemble documents and publish them directly to the web for all to see!
### Writing and documenting R functions (25 minutes)
Writing our own functions can be useful in several situations
- Capturing a common operation, such as retieving data from an internet resource.
- Representing a transformation that can be applied to different parts of the data, e.g., calculuating the difference in new cases, applied to different counties or states
- Summarizing overall steps in a work flow, e.g., translating, aligning, and clustering a DNA sequence.
As an example, the following function takes as input a vector representing the cummulative number of observations (e.g., new cases, deaths) over successive days. It calculates the difference in cases, and then uses `stats::filter()` to return the trailing average of the difference
```{r}
trailing_difference <- function(x, n_days = 7) {
diff <- diff(c(0, x))
average_weights <- rep(1 / n_days, n_days)
lag <- stats::filter(diff, average_weights, sides = 1)
as.numeric(lag)
}
```
Here's a vector and the result of applying the function
```{r}
obs <- c(1, 2, 4, 6, 7, 12, 14, 18, 19, 20, 20, 21, 22, 24)
trailing_difference(obs, n_days = 4)
```
This function could be used, for instance, to calculate the seven-day average number of new cases in each county of the US.
Let's formalize this function, including documentation, in a separate file.
- Use RStudio File -> New Script to create a file `workdir/Week-06/trailing_difference.R`
- Document the function using `roxygen` formatting. This involves placing special comment characters `#'` at the start of lines, and using 'tags' that describe different parts of the function.
- `@title`: a one-line description of the help page
- `@description`: a short (paragraph-length?) summary of what capabilities the help page documents
- `@param`: one for each argument, describing the value (e.g., '`numeric()`' and meaning (e.g., 'vector of observations over successive days')
- `@return`: the value returned by the function
- `@examples`: valid _R_ code illustrating how the function works.
We use the special tags `@importFrom` to tell _R_ that we want to use the `filter` from the stats package, and `@export` to indicate that the function is meant for the 'end user' (i.e., us!).
```
#' @title Trailing difference of a vector
#'
#' @description Calculate the difference of successive elements of a
#' vector, and then the running average of the difference. The
#' width of the difference can be specied as an argument.
#'
#' @param x numeric() vector of observations.
#'
#' @param n_days scalar (length 1) numeric() number of days used to
#' calculate the trailing average. The length of `x` should be
#' greater than `n_days`.
#'
#' @return numeric() vector with the same length of x, representing
#' the n_day average difference in x. Initial values are `NA`.
#'
#' @examples
#' obs <- c(1, 2, 4, 6, 7, 12, 14, 18, 19, 20, 20, 21, 22, 24)
#' trailing_difference(obs, n_day = 4
#'
#' @importFrom stats filter
#'
#' @export
trailing_difference <- function(x, n_days = 7) {
diff <- diff(c(0, x))
average_weights <- rep(1 / n_days, n_days)
lag <- stats::filter(diff, average_weights, sides = 1)
as.vector(lag)
}
```
### Create an _R_ pacakge!
The vignette and documents function are great for our own use, but we'd really like to share these with our colleagues so that they too can benefit from our work. This is very easy to do.
- In RStudio, choose File -> New project -> New directory -> R package
- Enter a package name, e.g., `MyQuarantine`, and use the `add` button to select the vignette `mtcars_regression.Rmd` and _R_ `trailing_difference.R` files.
- Choose a location for your package, select the 'Open in a new session' button, and click 'Create project'.
The end result is a directory structure that actually represents an _R_ package that you can build and share with your colleagues. The directory contains
- An R/ folder, containing your R source code
- A vignettes/ folder, containing the vignette you wrote.
Later in the week we'll see how to
- Edit the `DESCRIPTION` file to describe your package
- Generate help pages in the `man/` folder from the roxygen comments in the `.R` files.
- Create the knit vignette from the source .Rmd file.
- Build the package for distribution and sharing with others.
## 4 Days Write a vignette!
Choose one week from the quarantine, and write a vignette summarizing the material. Start with an outline using using level 1 `#` and level 2 `##` headings as well as bulleted lists / short paragraphs. One could think of this as structured more-or-less along the lines of classic scientific paper, with introductino, methods, results (use case), and discussion
```
# Introduction to tidy data
# Tools for working with tidy data
## Key packages
- dplyr
- ggplot2
## 5 Essential functions
- `mutate()`
- `filter()`
- `select()`
- `group_by()` / `ungroup()`
# Use case
E.g., summarizing and visualizing cell subtypes in Week 5.
- narrative text describing what steps are being taken
- include code chunks for reproducible analysis
- include figures and / or summary tables to help communicate your results
# Discussion
A paragraph on strengths and limitations of the tidy approach
Narrative on use case / insights
Future directions
## Session information
- include a code chunk that has the command `sessionInfo()`; this
documents the specfici versions of packages you used.
```
## 3 Days Create documented, reusable functions!
Create functions that represents key operations (e.g., data retrieval), data transformations (e.g., trailing difference in new cases), or that integrate several related steps in an analysis (e.g., `translate()`, `unique()`, align, and cluster DNA sequences).
Place the function(s) in separate files (one or several functions per file). Document them using the notation introduced on Monday.
Make sure the functions work by writing simple examples.
## 2 Days Share your work as a package!
Use the steps outlined on Monday to create an _R_ package from Tuesday's vignette and Wednesday's functions.
### DESCRIPTION
Edit the DESCRIPTION file to include a Title and Description. Update the Author infromation to include your name. Add as maintainer your name and email address, using the format `Ima Maintainer <[email protected]>`.
### Documentation
Make sure that `getwd()` returns the path to the package. Run the command (it may be necessary to install additional packages.
```{r, eval = FALSE}
getwd() # in the directory of the project; use `setwd()` to change
devtools::document()
```
There may be problems with the roxygen that you wrote; investigate how to fix these.
This creates file(s) in the `man/` directory that transform the 'roxygen' comments (lines with `#'`) in the R/ file. Open one of the man files and use the 'preview' button to see the help page.
### Vignettes
Add the following lines to the end of the DESCRIPTION file:
```
Suggests: knitr
VignetteBuilder: knitr
```
Update the 'yaml' at the top of the vignette
```
---
title: "Multiple regression plots"
author: "Shawn Matott"
date: "5/13/2020"
output: html_document
vignette: |
%\VignetteIndexEntry{ Multiple Regression Plots }
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```
Use the following command to build the vignette
```{r, eval = FALSE}
getwd() # in the directory of the project
devtools::build_vignettes()
```
### Build and share
With all the pieces now in place, choose Build -> Build Source Package. This creates a single file with a name like `MyQuarantine_0.1.0.tar.gz` that you can share with colleagues -- use
```
install.packages("path/to/MyQuarantine_0.1.0.tar.gz", repos = NULL)
```
to install your pacakge!
## Today! (Friday) Zoom check-in
### Review and troubleshoot
Vignettes
- Any vignettes to share?
- Shruti's QuaRantine Learnings
Documented functions
Packages
### Course review
Basic _R_
- Numeric, character, logical vectors
- subsetting, applying functions
- The data.frame
'Tidy' _R_ and visualization
- The tibble and pipe (`%>%`)
- readr: `read_csv()`
- dplyr: `mutate()`, `filter()`, `select()`, `group_by()`, `summarize()`
- ggplot2
- Visualizing the pandemic locally and globally
Machine learning
- Underlying concepts
- Support vector machines, k-nearest neighbors, KNN
- Accuracy and confusion matrix; ROC curves and AUC
Bioinformatics
- _Bioconductor_ packages
- Classes, generics, and methods for representing sequences and ranges
- Virus phylogeny
- Host gene expression
Reproducible communication
- Vignettes
- Documented functions
- Packages
### Feedback & next steps