-
Notifications
You must be signed in to change notification settings - Fork 4
/
nCoV-Sandbox.Rmd
666 lines (473 loc) · 21.5 KB
/
nCoV-Sandbox.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
---
title: "nCoV 2019 Sandbox"
output:
html_document:
df_print: paged
---
```{r setup, include=FALSE}
#Preamble
require(knitr)
require(tidyverse)
require(gridExtra)
require(rstan)
source("R/DataLoadUtils.r")
source("R/BasicEpiAnalyses.r")
```
# What is this?
The nCoV Sandbox is a running analytic blog we are "writing" as we try to apply some methods we had in the very early stages of development, and some old friends, to the 2019 nCoV outbreak. It also is us trying to run some analyses to get our on handle on, and keep up to date on, the epidemiology of the emerging epidemic.
This is a bit of an excercise in radical transparency, and things are going start out very messy...but will hopefully get cleaner and more meaningful as things go. But the old stuff will (for the moment) remain at the bottom for posterity.
# Analytic Blog
## Brief epidemiologic update and examination of overdispersion (2020-02-09 )
One of the big questions about the nCoV-2019 epidemic is how do we
reconcile the appareant $R_0$ of the large epidemic in Wuhan with the
apparent lack of onward transmission elsewhere. Jess Metcalf and
I do a rough analysis of the overdispersion or control
that would be needed ot reconcile these anlayses
<a href="DispersionExploration.html">Here</a>.
Because all of the attempts at line list data from public source
are not really keeping up with the pace of the epidemic as
detailed media reports, etc. become less common I have been generally
ignoring them...but perhaps it is time just to check out how
recent updates to the "Kudos" line list have changed the picture, if at all.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
kudos <- readKudos3("data/Kudos Line List-2-9-2020.csv") %>%
mutate(age_cat = cut(age, seq(0,100,10)))
grid.arrange(age_dist_graph(kudos),
epi_curve_reported_cases(kudos),
nrow=2)
```
The source of line list data is starting to shift as countries get
there first few cases, which still make it to media reports, etc. Almost
all ealry cases in the line list are from China, while the later ones
are from elsewhere.
```{r, warning=FALSE}
ggplot(kudos, aes(x=symptom_onset, fill=as.factor(country))) +
geom_bar()
```
### age distribution of cases and deaths
Not much more data on cases, but worth a brief reprise on the
age distribution vs. merge. Automating this in fuctions as well.
```{r, warning=FALSE, message=FALSE, echo=FALSE}
mers_comp_analysis <- compare_deaths_to_MERS(kudos)
mers_comp_analysis$plt
```
**Figure:** Odds ratio of death versus 50-59 year olds
by age group for MERS-CoV and nCoV-2019. Log-scale.
**Table:** Odds ratio of death by age group for MERS=CoV and nCoV-2019
```{r, echo=FALSE}
kable(mers_comp_analysis$table)
```
The few new cases have not changed the picture much. How does the age distribution of reported cases in China compare with the age distribution
of cases nationally.Age distribution data from https://www.populationpyramid.net/china/2019/.
```{r, echo=FALSE, warning = FALSE, message=FALSE}
##Load the population pyramid data and collapse into
##10 year age categories
pop_pyramid <- read_csv("data/Pop-pyramid-China-2019.csv") %>%
mutate(mid_age=rowMeans(matrix(as.numeric(str_split_fixed(Age,"-",2)),ncol=2)) ) %>%
mutate(age_cat=cut(mid_age,breaks=c(0,10,20,30,40,50,60,70,1000),
labels=c("0-9","10-19",
"20-29","30-39",
"40-49","50-59",
"60-69","70+"))) %>%
mutate(total=M+F) %>%
select(-Age, -mid_age) %>%
group_by(age_cat) %>%
summarise(M=sum(M), F=sum(F), total=sum(total))
## Now get the kudos data in the same form. Total only for now.
## and use only chinese cases
cases_by_age <- kudos %>%
mutate(age_cat=cut(age,breaks=c(0,10,20,30,40,50,60,70,1000),
labels=c("0-9","10-19",
"20-29","30-39",
"40-49","50-59",
"60-69","70+"))) %>%
filter(country=="China") %>%
drop_na(age_cat) %>%
select(age_cat) %>%
group_by(age_cat) %>%
summarize(total=n())
##merge data frames
age_comp <- full_join(cases_by_age, select(pop_pyramid, age_cat, total),
by="age_cat") %>%
rename(cases=total.x, population=total.y) %>%
mutate(cases=cases/sum(cases), population=population/sum(population)) %>%
mutate(RR=(cases/population))
##Now compare the distributions
age_comp_plt <- age_comp %>%
select(-RR) %>%
mutate(cases=-cases) %>%
pivot_longer(-age_cat, names_to="group", values_to="p") %>%
ggplot(aes(x=age_cat, y=p, fill=group)) +
geom_bar(stat="identity") +
coord_flip() +
scale_y_continuous(breaks=c(-.2,-.1,0,.1,.2),
limits=c(-.25,.25),
labels =c("20%","10%","0%","10%","20%")) +
geom_text(data=age_comp, aes(x=age_cat, y=.2,
label=round(RR,1), fill=NULL)) +
xlab("age category") + ylab("proportion of group") +
theme_bw() + theme(legend.position = "bottom")
age_comp_plt
```
## Reconstructing Past Incidence Using Cumulative Reports (1-28-2020)
Goal is to do a better job of recreating the daily case counts
in each area so we have implied epidemic curves to work with for
some of the more sophisticated stuff (hopefully) to come.
First let's load in the data. Currently using only
confirmed cases (driven a bit by data source),
but unclear how long this will be viable.
```{r, message=FALSE, echo=FALSE, warning=FALSE}
#Load in the JHU CSSE Data
jhucsse <- read_JHUCSSE_cases("2020-01-28 23:59",
append_wiki = TRUE)
##Continue to filter to China for the moment.
##also total the suspected and confirmed cases.
jhucsse_china <- jhucsse %>%
filter(Country_Region%in%c("Mainland China", "Macau", "Hong Kong"))
## Fit a monotinically increasing spline to each area with
## at least 25 cases at any ppint
tmp <- jhucsse_china%>%filter(Confirmed>=25)
tmp <- unique(tmp$Province_State)
## Look at consitencey in exponential groqth by areas.
analyze <- jhucsse_china %>% drop_na(Confirmed) %>%
filter(Province_State%in%tmp)
##Get the implied daily incidence for each province
##by fitting a monitonically increasing spline and then
##taking the difference (a little less sensitive to
##perturbations in reporting than taking raw difference).
##Making sure only to infer over trhe suport
tmp_dt_seq <- seq(ISOdate(2019,12,1), ISOdate(2020,1,28), "days")
incidence_data<- analyze %>% nest(-Province_State) %>%
mutate(cs=map(data, ~splinefun(x=.$Update, y=.$Confirmed,
method="hyman"))) %>%
mutate(Incidence=map2(cs,data, ~data.frame(Date=tmp_dt_seq[tmp_dt_seq>=min(.y$Update)],
Incidence= diff(c(0, pmax(0,.x(tmp_dt_seq[tmp_dt_seq>=min(.y$Update)]))))))) %>%
unnest(Incidence) %>% select(-data) %>% select(-cs)
inc_plt <- incidence_data%>%filter(Date>"2020-1-1") %>%
ggplot(aes(x=Date, y=Incidence, fill=Province_State)) +
geom_bar(stat="identity", position="stack") +
theme_bw()+theme(legend.position="bottom")
inc_plt
## Now let's look at Hubei a little bit to see how we feel about
##this approach. Compare with just looking at raw differences.
jhucsse_hubei <-jhucsse %>%
filter(Province_State=="Hubei") %>%
filter(Update>="2020-01-22")
compare_points <- data.frame(Date=sort(jhucsse_hubei$Update[-1]),
Incidence=diff(sort(jhucsse_hubei$Confirmed)))
incidence_data%>%filter(Date>"2020-1-1") %>%
filter(Province_State=="Hubei") %>%
ggplot(aes(x=Date, y=Incidence))+
geom_bar(stat="identity", position="stack") +
theme_bw() +
geom_point(data=compare_points,
x=round(compare_points$Date,"day")+
as.difftime(12,unit="hours"),
y=compare_points$Incidence, color="green")
```
Things looks a little funny prior to the first, but this
does seem like it should give a rough pseudo epidemic curve for
the purpose of anlaysis.
## Basic Epi Summary 1-27-2020
First goal for the day, dig in deeper on the age specific data
and compare with the MERS-CoV data in a bit more detail.
First as always, load and sumarize the most recent Kudos line
list (https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1187587451)
```{r, echo=FALSE, message=FALSE}
source("R/DataLoadUtils.r")
source("R/BasicEpiAnalyses.r")
kudos <- readKudos2("data/Kudos Line List-1-27-2020.csv") %>%
mutate(age_cat = cut(age, seq(0,100,10)))
grid.arrange(age_dist_graph(kudos),
epi_curve_reported_cases(kudos),
nrow=2)
```
Note that we don't have any linelist information on the deaths
that occured before arou 1/15 in this line lisat. Moving forward with this data comparing with MERS-CoV data from Saudi Arabia
through summer 2014.
```{r, warning=FALSE, message=FALSE, echo=FALSE}
mers_dat <- read_csv("data/MERSDeathPublic.csv",
col_types = cols(age_class=col_factor(levels = c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70+"))))
##make look like the kudos dat enough for us to run same table
mers_dat <- mers_dat %>% rename(death=died) %>%
rename(age_cat=age_class)
mers_OR_tbl <- OR_table_age(mers_dat, combine_CIs = FALSE)
#make the nCoV table look like the MERS one for a more direct
#comparison.
kudos <- kudos %>%
mutate(age_cat=cut(age,breaks=c(0,10,20,30,40,50,60,70,1000),
labels=c("0-9","10-19",
"20-29","30-39",
"40-49","50-59",
"60-69","70+")))
##Drop the catagories <30 due to lack of data.
age_OR_tbl <- OR_table_age(kudos, combine_CIs = FALSE)
##Make everything under 30 be NA
age_OR_tbl$OR[1:3] <- NA
age_OR_tbl$CI_low[1:3] <- NA
age_OR_tbl$CI_high[1:3] <- NA
##combine to plot
comb_OR_tbl <- bind_rows(nCoV=age_OR_tbl,
MERS=mers_OR_tbl, .id="disease")
comb_OR_tbl$label <- sprintf("%1.2f (%1.2f, %1.2f)",
comb_OR_tbl$OR,
comb_OR_tbl$CI_low,
comb_OR_tbl$CI_high)
comb_OR_tbl$label[comb_OR_tbl$OR==1] <- NA
ggplot(comb_OR_tbl, aes(x=age_cat, y=OR, color=disease, label=label)) +
geom_pointrange(aes(ymin=CI_low, ymax=CI_high),
position = position_dodge2(width = 0.5, padding = 0.5)) +
scale_y_log10() + ylab("OR of death")+
xlab("Age") +
theme_bw()
comb_OR_wide <- comb_OR_tbl %>%
select(disease,age_cat,label) %>%
pivot_wider(names_from=disease, values_from = label)
comb_OR_wide[6,c("nCoV","MERS")] <- "1"
comb_OR_wide$nCoV[1:3] <- "-"
```
**Figure:** Odds ratio of death by age group for MERS=CoV and nCoV-2019. Log-scale.
**Table:** Odds ratio of death by age group for MERS=CoV and nCoV-2019
```{r, echo=FALSE}
kable(comb_OR_wide)
```
**Take aways from OR of death comparison**
- It looks like the pattern of relative mortality is similar
in MERS-CoV and nCoV-2019 even if absolute rates are different.
- The paucity of data on age specific deaths in the current data
set for nCoV-2019 means uncertainty is huge
- Still assuming these are similar in a relative sense seems
reasonable.
### Thought Experiment
What if nCoV symptomatic and death rates were identical to
those of MERS-CoV. How many cases would the current line
list represent? How about the full data if they follow
a similar age distribution?
Using mortality and infection rates for this paper
in AJE on MERS-CoV symptomatic ratios and IFRs
ratios (10.1093/aje/kwv452), and a lot of assumptions:
1. Age distribution of all cases looks like line list cases.
2. Age distribution of deaths looks like line list deaths.
3. Confirmed cases (4,474) are roughly equal to symptomatic cases.
4. There are 107 deaths.
5. The symptomatic ratio is the same as MERS.
6. All line list cases that will die have died.
**Table:** Implied number of cases and needed ratio of IFR
in nCoV and MERS-CoV to reconcile deaths and implied cases.
```{r, echo=FALSE, warning=FALSE, message=FALSE}
sym_fat_ratios <-
read_csv("data/MERSSymptom-FatalityRatios.csv")
total_cases <- sum(age_OR_tbl$alive +
age_OR_tbl$dead)
total_dead <- sum(age_OR_tbl$dead)
implied_inf_table <- age_OR_tbl %>%
inner_join(sym_fat_ratios) %>%
mutate(total=alive+dead) %>%
mutate(prop=total/total_cases)%>%
mutate(prop_dead=dead/total_dead)%>%
mutate(est_total=prop*4474)%>%
mutate(est_dead=prop_dead*107) %>%
mutate(sr_implied=est_total/sym_ratio) %>%
mutate(ifr_implied=est_dead/ifr) %>%
select(age_cat,prop, prop_dead, est_total,
est_dead, sym_ratio, ifr,
sr_implied, ifr_implied) %>%
ungroup() #%>%
with(implied_inf_table, {
tmp<<-data.frame(age_cat="Overall",
prop=1, prop_dead=1,
est_total=sum(est_total),
est_dead=sum(est_dead),
sym_ratio=weighted.mean(sym_ratio, prop),
ifr=weighted.mean(ifr, prop),
sr_implied=sum(sr_implied),
ifr_implied=sum(ifr_implied))
})
implied_inf_table <- implied_inf_table %>%
bind_rows(tmp)%>%
mutate(ifr_reduction = est_dead/(ifr*sr_implied))
kable(implied_inf_table, digits = 2 ,
col.names =c("Age","pr alive","pr dead","est. cases",
"est. dead",
"MERS symptomatic ratio",
"MERS IFR", "Implied Infections by SR",
"Implied Infections by IFR",
"IFR Ratio to Reconcile"))
```
So, if the symptomatic ratio for nCoV 2019 is similar to what was
implied by the confirmed cases of MERS-CoV (and other assumptions
hold) the following things are
true.:
- There are are at least 14,700 nCoV-2019 infections out there on
27-1-2020. This is likely low as the 4,474 reported cases are
likely a bit lower than there actually are.
- If this is the case, the IFR for nCoV is likely smaller than
1/50th of that of MERS-CoV or lower
(so less than 6 deaths per 1,000 infections)
- The difference is bigger in younger individuals (1/100th or less)
than older ones (1/10th).
**Note this is interesting note it is the result of a
thought experiment only!!!**
## Basic Epi Summary 1-25-2020
Three goals for today:
1. Make functions to automate most ofbasic epi report
2. Add a few new basic analyses.
### Summarizing the line list data
Age distribution and epicurve for cases where we have
individual line list information.
```{r, echo=FALSE, message=FALSE}
source("R/DataLoadUtils.r")
source("R/BasicEpiAnalyses.r")
kudos <- readKudos2("data/Kudos Line List-1-25-2020.csv") %>%
mutate(age_cat = cut(age, seq(0,100,10)))
grid.arrange(age_dist_graph(kudos),
epi_curve_reported_cases(kudos),
nrow=2)
```
Now lets look at some basic infomration on survival by age group
and gender.
```{r, warning=FALSE, echo=FALSE}
kable(OR_table_age(kudos))
kable(OR_table_gender(kudos))
```
**Take aways from the line list data:**
- There is a huge survival effect by age.
- No apparent effect of gender.
- If the line list data is reflective of when deaths got
sick in the cumlative data, the overall CFR may increase quite a bit in coming weeks.
### Bringing in the cumulative case data.
Now lets start to look at the aggregate cumulative case
data as that is going to be the most widely available, complete
and the basis for most of our predictive style analyses.
First we will focuse on Mainland China, Hong Kong and
Macau.
```{r, message=FALSE}
jhucsse <- read_JHUCSSE_cases("2020-01-25 23:59", append_wiki = TRUE)
##Filter to China:
jhucsse_china <- jhucsse %>%
filter(Country_Region%in%c("Mainland China", "Macau", "Hong Kong"))
jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
ggplot(aes(x=Update, y=Confirmed, col=Province_State)) +
geom_line() + scale_y_log10()
```
Looking at all provinces, so let's narrow it to places that at some
point experience at least 25 confimed cases and
look vs. a straight log-linear line.
Note that is is not quite right for real exponential growth since we
are looking at the cumulative report rather than the
```{r}
tmp <- jhucsse_china%>%filter(Confirmed>=25)
tmp <- unique(tmp$Province_State)
## Look at consitencey in exponential groqth by areas.
analyze <- jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
filter(Province_State%in%tmp)
#Get the slopes for each province.
slopes <- analyze %>% nest(-Province_State) %>%
mutate(slope=map_dbl(data, ~lm(log10(.$Confirmed)~as.Date(.$Update))$coef[2])) %>%
select(-data) %>% mutate(exp_scale=10^(slope))
kable(slopes, digits=2)
#ggplot(slopes, aes(x=Province_State, y=slope)) +
# geom_bar(stat="identity") + coord_flip()
##Plot the exponential growth rate in eaach against a linear rate.
jhucsse_china %>% drop_na(Confirmed) %>%
filter(Update>"2020-01-14") %>%
filter(Province_State%in%tmp)%>%
ggplot(aes(x=Update, y=Confirmed, col=Province_State)) +
geom_point() + scale_y_log10() + stat_smooth(method="lm", se=FALSE)
```
Leaving it there for the moment due to lack of aggregate data.
**Cumulative analysis preliminary so too early to say much but:**
- Growing everywhere with at least a few cases
- Rates seem very roughly similar
## Basic Epi Summary 1-24-2020
Simple snapshot as of 2020-24-1 based on snapshot of linelist data
derived from public sources from:
https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1449891965
(AKA the Kudos list).
This is some very basic episnapshots that should be improved
in the coming days.
First just take a rough look at the age distribution of cases.
Ten year increments.
```{r}
source("R/DataLoadUtils.r")
kudos <- readKudos2("data/Kudos Line List-1-24-2020.csv") %>%
mutate(age_cat = cut(age, seq(0,100,10)))
#Age distribution of cases.
require(ggplot2)
ggplot(drop_na(kudos, age_cat),
aes(x=age_cat, fill=as.factor(death))) +
geom_bar( color="grey") + coord_flip() + xlab("Age Catergory")
```
Next, are we seeing any obvious differences in mortality
by gender or age?
```{r, echo=FALSE, warning=FALSE}
gender_odds <- kudos%>%group_by(gender, death)%>%
summarize(n())%>%
mutate(death=ifelse(death,"dead","alive"))%>%
pivot_wider(names_from = death,values_from=`n()`)
#%>%
#mutate(OR=dead/alive)
#gender_odds <- gender_odds%>%mutate(OR=OR/gender_odds$OR[1])
gender_odds_mdl <- glm(death~gender, data=kudos,
family=binomial)
gender_odds$OR <- c(1, exp(gender_odds_mdl$coef[2]))
gender_odds$CI <- c("-",
paste(round(exp(confint(gender_odds_mdl)[,2]),2),
collapse = ","))
kable(gender_odds, digits=2)
age_odds <- kudos %>%
drop_na(age_cat)%>%
group_by(age_cat,death)%>%
summarize(n())%>%
mutate(death=ifelse(death,"dead","alive"))%>%
pivot_wider(names_from = death,values_from=`n()`) %>%
replace_na(list(dead=0))
kudos$age_cat <- relevel(kudos$age_cat, ref="(50,60]")
age_odds_mdl <- glm(death~age_cat, data=kudos,
family=binomial)
age_odds$OR <- c(exp(age_odds_mdl$coef[2:5]),1,
exp(age_odds_mdl$coef[6:8]))
tmp <- round(exp(confint(age_odds_mdl)),2)
tmp <- apply(tmp, 1, paste, collapse=",")
age_odds$CI <- c(tmp[2:5],"-",tmp[6:8])
kable(age_odds, digits = 2)
```
Even as sparse as this data is, this is showing some clear
evidence of and age relationship.
Epidemic curve of line list cases. Not
super informative at this point.
```{r}
ggplot(kudos, aes(x=symptom_onset, fill=as.factor(death))) +
geom_bar()
```
A touch interesting that all deaths are early on. This suggests either (A) surveillance was really biased towards deaths in the early days, or (B) a lot of the later reports have not had time to die.
[Note that there was perviously a 1-23-2020 summary
but that was too preliminary even for this]
# Planning Notes/Ideas
- Apply basic framework to data so far, focusing on final size first
- Also do some basic epi summaries
- here is a place for cool visulizations.
- Focus on province/state level in China (including Hong Kong/Macau SAR)
- Run this as a very open excercise in open science.
- Take snapshots of data...starting with stuff from : https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1449891965.
- Get total case snapshot from https://docs.google.com/spreadsheets/d/169AP3oaJZSMTquxtrkgFYMSp4gTApLTTWqo25qCpjL0/edit#gid=975486030
- Basically take snapshots and then post.
- Longer term...use approach and inference stuff for effect of actions.
- Keep a database of dates of intervention actions, major events.
- 1/22/2020 : Wuhan Quarintine
### Some Tasks
- Download first snapshot, do some basic data cleaning and epi summaries.
- Do some data cleaning and loading
- Make some basic summaries
- Get province level covariates
- population
- population density
- average/high/low
- Average Feb temperature
- Average/high-low Absolute humidity
- Basic demographics (if available)
- [OTHER WEATHER]
- [OTHER INDEXES OF URBANIZATOIN/ECONOMY]
- Get the most basic model work