Skip to content

Commit 9d40d4c

Browse files
author
fullname
committed
WIP correlation edits
1 parent 21eee32 commit 9d40d4c

File tree

1 file changed

+28
-20
lines changed

1 file changed

+28
-20
lines changed

slides/day1-afternoon.qmd

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -691,7 +691,11 @@ dataset, which is a snapshot [**as of**]{.primary} May 31, 2022 that contains da
691691

692692
```{r head-edf}
693693
#| echo: false
694-
edf <- covid_case_death_rates
694+
edf <- covid_case_death_rates |>
695+
# Filter out locations with no deaths recorded:
696+
group_by(geo_value) |>
697+
filter(!all(death_rate == 0)) |>
698+
ungroup()
695699
head(edf |> as_tibble())
696700
```
697701

@@ -745,29 +749,33 @@ attr(edf, "metadata")
745749

746750
## Features - Correlations at different lags
747751

752+
Correlation coefficients:
753+
754+
- "Strength" and "direction" of a "relationship" between two variables
755+
- Normalized measures of
756+
- how well (aspects of) one variable might be estimated from another
757+
- using particular models and metrics
758+
- based on training errors^[More rigorous approaches are covered tomorrow.].
759+
760+
## Features - Correlations at different lags
761+
748762
```{r corr-lags-ex}
749763
#| echo: true
750-
## cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value)
751-
## cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14)
752-
cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, method = "kendall")
753-
cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14, method = "kendall")
764+
epi_cor(edf, case_rate, death_rate, dt1 = -14, cor_by = geo_value, method = "pearson")
754765
```
755766

756-
```{r plot-corr-lags-ex}
757-
#| fig-align: center
758-
#| warning: false
759-
rbind(
760-
cor0 |> mutate(lag = 0),
761-
cor14 |> mutate(lag = 14)
762-
) |>
763-
mutate(lag = as.factor(lag)) |>
764-
ggplot(aes(x = time_value, y = cor)) +
765-
geom_hline(yintercept = 0) +
766-
geom_line(aes(color = lag)) +
767-
scale_color_brewer(palette = "Set1") +
768-
scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
769-
labs(x = "Date", y = "Correlation", col = "Lag")
770-
```
767+
- For each location (`cor_by = geo_value`),
768+
- how well might death rates be estimated by case rates from 14 days ago (`case_rate, death_rate, dt = -14`),
769+
- with a linear model and related error measure, and what was the sign of the cofficient (`method = "pearson"`),
770+
- on this training+evaluation set (`edf`)?
771+
772+
## Features - Correlations at different lags
773+
774+
TODO lag analysis: Pearson by geo, then mean
775+
776+
## Features - Correlations at different lags
777+
778+
TODO lag analysis: Kendall by time, then mean
771779

772780
## Features - Compute growth rates
773781

0 commit comments

Comments
 (0)