This repository has been archived by the owner on Sep 30, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 23
/
2017-10-11 Epicycles of Analysis.Rpres
64 lines (51 loc) · 2.1 KB
/
2017-10-11 Epicycles of Analysis.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Epicycles of Analysis
========================================================
author: Nico Buettner
date: 2017-10-11
autosize: true
Matsui, E. & Peng, R. D. *The Art of Data Science*. (Leanpub, 2015).
Data analysis as a circular process
========================================================
incremental: true
- highly iterative
- non-linear
- information is learned at each step
- refine, redo, proceed to the next step
- What's differentiates data analysis from a study?
- no formal development or execution of a plan to collect data
- not formally deriving a hypothesis
- Asumption: data already exists
Visual representation
========================================================
incremental: true
![alt text](images/big_picture_eda.gif)
***
1. Setting Expectations
2. Collecting information (data), comparing the data to your expectations, and if the expectations don’t match
3. Revising your expectations or fixing the data so your data and your expectations match
4. Go through 1 to 3 iteratively at each of the following epicycles
Epicycles
========================================================
incremental: true
![alt text](images/epicycles.png)
***
1. Stating and refining the question
2. Exploring the data
3. Building formal statistical models
4. Interpreting the results
5. Communicating the results
Tabular representation
========================================================
incremental: true
![alt text](table.jpg)
Example: Asthma prevalence in the U.S.
========================================================
incremental: true
- Is there already an answer to your question? --> refine
- How many people in the United States have asthma that is not currently controlled, and what are the demographic predictors of uncontrolled asthma?
- United States adult population, 18 years and older
- Does the number of rows in the data match the description in the codebook?
- Are your expectations about the age variable in line with your data?
- Model the data
- Interpret the results: age, African American/black race, body mass index, smoking status, low income, gender
- Communicate the findings: feedback, questions