-
Notifications
You must be signed in to change notification settings - Fork 1
/
2019_07_JSM_DataScience.Rmd
334 lines (235 loc) · 10.7 KB
/
2019_07_JSM_DataScience.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
---
title: 'Sharpening the Tools in Your Data Science Toolbox `r icon::fa("toolbox")`'
author: "Jessica Minnier, PhD<br><span style = 'font-size: 50%;'>OHSU-PSU School of Public Health<br>Knight Cancer Institute Biostatistics Shared Resource<br>Oregon Health & Science University</span>"
date: '[Joint Statistical Meetings](https://ww2.amstat.org/meetings/jsm/2019/onlineprogram/ActivityDetails.cfm?sessionid=218337), July 29, 2019<br><br>`r icon::fa("link")` [bit.ly/jmin-jsm19](https://bit.ly/jmin-jsm19)<br>`r icon::fa("twitter")` [datapointier](https://twitter.com/datapointier)'
output:
xaringan::moon_reader:
css: [css/xaringan-themer.css, css/my-theme.css]
lib_dir: libs
nature:
countIncrementalSlides: false
titleSlideClass: ["left", "middle", "inverse"]
highlightLines: true
highlightStyle: solarized-dark
ratio: "16:9"
includes:
in_header: ../libs/header.html
---
layout: true
<div class="my-footer"><span>bit.ly/jmin-jsm19</span></div>
---
```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
library(tidyverse)
knitr::opts_chunk$set(
warning=FALSE,
message=FALSE,
fig.width=10.5,
fig.height=4,
comment=NA,
rows.print=16,
echo=FALSE)
theme_set(theme_bw(base_size = 24))
```
```{r xaringan-themer, include = FALSE}
# devtools::install_github("gadenbuie/xaringanthemer")
library(xaringanthemer)
mono_light(
base_color = "#3A6185", #5e97c9" ## OHSU Marquam, #"#125972" ## picked to coordinate with rOpenSci logo
code_highlight_color = "#c0e8f5",
link_color = "#38BDDE",
header_font_google = google_font("Josefin Sans"),
text_font_google = google_font("Montserrat", "300", "300i"),
code_font_google = google_font("Droid Mono"),
text_font_size = "26px",
outfile = "css/xaringan-themer.css"
# title_slide_background_image = "img/CAN-RGB-4C-REV.png",
# title_slide_background_position = "80% 60%",
# title_slide_background_size = "400px" #"40%"
)
```
class: inverse, middle
# Who am I?
- Use R every day, git often, unix sometimes, python rarely
- Teach intro to R, R markdown, intro to stats for data science, R Shiny (and Math/Stats)
- Collaborative statistician (mostly applied work, no PhD students, no Postdocs, some MS students and staff)
- Involved in: organizing Cascadia R conf, PDX R Meetup, R Ladies, BioData Club
---
# Statistics is changing
A non-random sample of buzzwords:
- Big Data
- Data Science
- Machine Learning
- Reproducible Coding
- Version Control
- Interactive Dashboards
---
<center><img src="img/asa_statement_datasci.png" width="80%"/></center>
> Sustained and substantial collaborative effort with researchers with expertise in data organization and in the flow and distribution of computation. <font color="red">Statisticians must engage them, learn from them</font>, teach them, and work with them.
> Statistical <font color="red">education and training must continue to evolve</font>—the next generation of statistical professionals needs a <font color="red">broader skill set</font> and must be more able to engage with database and distributed systems experts.
> The next generation must include more researchers with skills that <font color="red">cross the traditional boundaries of statistics</font>, databases, and distributed systems; there will be an ever-increasing demand for such “multi-lingual” experts.
---
# Programming itself is changing
- Data Science is programming intensive
+ Big data and complex models (machine learning beyond regression)
- Beyond SAS
+ R is easier to learn than ever before (see: tidyverse)
+ R packages are easier to make and distribute
+ R is not "slow" anymore
+ [FDA allows R](https://blog.revolutionanalytics.com/2012/06/fda-r-ok.html) for clinical trials
- Python is most common in data science
+ but not statistics!
---
## Data Science Jobs: Python, SQL, .... R
.pull-left[
<center><img src="img/r4stats_popularity_jobs.png" width="100%"/></center>
[R4stats - Robert A. Muenchen](http://r4stats.com/articles/popularity/)
]
.pull-right[
<center><img src="img/r4stats_popularity_jobs_pct.png" width="100%"/></center>
]
---
## By contrast, scholarly articles: SPSS, R, SAS
.pull-left[
<center><img src="img/r4stats_popularity_articles.png" width="90%"/></center>
]
.pull-right[
<center><img src="img/r4stats_popularity_articles_trend.png" width="98%"/></center>
[R4stats - Robert A. Muenchen](http://r4stats.com/articles/popularity/)
]
---
.pull-left-40[
## How to keep up?
Learning takes time, dedication, courage.
]
.pull-right-60[
<center><img src="img/draw_owl.png" width="100%"/></center>
]
---
## Highlights of my education in programming
.pull-left[
CS courses in college (C, Java):
<img src="img/college_cs2.png" width="100%"align="top"/>
My PhD advisor wrote a *ton* of code and functions, shared it all with me.
<img src="img/library_gradschool.png" width="100%"align="top"/>
]
.pull-right[
Postdoc (learned git/github):
<img src="img/softwarecarp.png" width="100%"align="top"/>
First year Assistant Professor (R Markdown, Shiny were all new & exciting!)
<img src="img/user2014.png" width="100%"align="top"/>
]
---
## Learning while teaching
<img src="img/pres1.png" width="35%"align="top"/><img src="img/pres2.png" width="30%"align="top"/><img src="img/pres3.png" width="35%"align="top"/>
<img src="img/pres4.png" width="28%"align="top"/><img src="img/pres5.png" width="36%"align="top"/><img src="img/pres6.png" width="36%"align="top"/>
[github.com/jminnier/talks_etc](https://github.com/jminnier/talks_etc)
---
## Learning while teaching with new tools
(also: find collaborators better than you at all this - thanks [Ted Laderas!]())
<img src="img/teach1.png" width="27%"align="top"/><img src="img/teach2.png" width="33%"align="top"/><img src="img/teach3.png" width="33%"align="top"/>
###### [r-bootcamp.netlify.com](https://r-bootcamp.netlify.com/); [minnier.shinyapps.io/ODSI_categoricalData/](https://minnier.shinyapps.io/ODSI_categoricalData); [minnier.shinyapps.io/ODSI_continuousData](https://minnier.shinyapps.io/ODSI_continuousData/)
---
# Software development
.pull-left[
- Unix/shell (i.e. for cluster computing)
- Programming skills
- Version control (i.e. git/github/gitlab, bitbucket)
- Unit Testing (does your code work?)
- Collaborative coding
- Using databases (i.e. SQL)
- Automation (i.e. make)
]
.pull-right[
<img src="img/softwarecarp.png" width="100%" align="top"/>
https://software-carpentry.org/
<img src="img/whattheyforgot.png" width="100%" align="top"/>
https://whattheyforgot.org/
]
---
# Improve your coding skills
### Similar to learning a new language
- Difficult to do when busy/sporadically, unless you set aside time (make a deadline) or find an immersive experience
- Learn more R? SAS/STATA? Find workshops! (JSM, CSP, SDSS, Rstudio::conf, UseR!, local R meetups and conferences)
- Learn git/github/software development? Host a [Software Carpentry](https://software-carpentry.org/) workshop
<img src="img/sdss2020.png" width="25%"align="top"/><img src="img/rstudioconf2020.png" width="35%"align="top"/>
<img src="img/rstudioconfworkshop.png" width="35%"align="top"/>
---
layout: false
## Coding in R is more fun these days
- R Markdown $\to$ reproducible documents
+ transformative for collaborative statistical analyses
+ these slides are made with knitr/r markdown
+ make textbooks, webpages, much more
- Shiny $\to$ webpages to show off methodology/analyses
<center><img src="img/horst_rmarkdown_wizards.png" width="47%" height="50%" align="top"><img src="img/horst_ggplot2_masterpiece.png" width="41%" height="50%" align="top"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center>
---
# Find a community & learn together
.pull-left[
We all have something to learn!
- Meetup groups (R Users, R Ladies, Women Who Code)
- Hackweeks
- Data Science Workshops
- Data competitions (DataFest)
- Workshops at conferences
- Online: [#rstats Twitter](https://twitter.com/hashtag/rstats?src=hash), [Rstudio community](https://community.rstudio.com/),
[R 4 Data Science Learning community](https://www.rfordatasci.com/), [Tidy Tuesday](https://github.com/rfordatascience/tidytuesday)
- Start your own club!
]
.pull-right[
<img src="img/biodataclub1.png" width="70%" align="top">
https://biodata-club.github.io/
]
---
## It takes time to learn how to learn
.pull-left[
#### R specific learning checks from Alison Hill's [Big Magic with R: Creative Learning Beyond Fear](https://alison.rbind.io/talk/2018-cascadia-bigmagic/) - watch the [video](https://www.youtube.com/watch?v=seERGgG6uVs&list=PLxHlPKedTUbL7ckoP5Elrdrw64V7o5_h6&index=3&t=0s)!
<img src="img/hill_magic.png" width="95%" align="top">
]
.pull-right[
<img src="img/hill_learn.png" width="100%">
]
---
## Practice, practice, practice
- Learning a tool and then forgetting to use it is the best way to unlearn it
- Write down what you've learned (make a list of commands, write a blog post)
## Give talks
- Sign up to give talks at local meetups
- Teach workshops to students/fellow faculty/coworkers
- Journal/book clubs with your co-workers/team
---
.pull-left[
## Resources $\rightarrow$
- Interactive lessons
- Online courses
- Blog posts
## Ask for help
- Stack Overflow
- Rstudio Community
]
.pull-right[
<center><a href="https://github.com/jminnier/awesome-rstats/"><img src="img/jminnier_learnr.png" width="100%" height="50%">github.com/jminnier</a></center>
]
---
.pull-left[
# We don't want to be left behind
- Work on projects outside of your comfort zone
- Grow as statisticians, grow as a field
- Be open to other fields collaborating on "data science"
]
.pull-right[
# Don't forget to have fun!
<center><img src="img/horst_welcome_to_rstats_twitter.png" width="95%" height="50%"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center>
]
---
class: inverse
# Thank you!
.pull-left-60[
Contact me: `r icon::fa("envelope")` minnier-[at]-ohsu.edu, `r icon::fa("twitter")` [datapointier](https://twitter.com/datapointier), `r icon::fa("github")` [jminnier](https://github.com/jminnier/)
Slides available: [bit.ly/jmin-jsm19](https://bit.ly/jmin-jsm19)
Slide code and files available at: [github.com/jminnier/talks-etc](https://github.com/jminnier/talks_etc)
Slides created via the R package [xaringan](https://github.com/yihui/xaringan) by [Yihui Xie](https://twitter.com/xieyihui?lang=en) with the [xaringanthemer](https://github.com/gadenbuie/xaringanthemer) package by [Garrick Aden-Buie](https://github.com/gadenbuie), inspired by slides and slide formatting by [Alison Hill](https://alison.rbind.io/), [Jennifer Thompson](https://jenthompson.me/), and [Chester Ismay](https://ismayc.github.io/)
]
.pull-right-40[
<center><img src="img/horst_r_first_then.png" width="100%" height="50%"><a href="https://github.com/allisonhorst/stats-illustrations"><br>Allison Horst</a></center>
]