@@ -691,7 +691,11 @@ dataset, which is a snapshot [**as of**]{.primary} May 31, 2022 that contains da
691
691
692
692
``` {r head-edf}
693
693
#| echo: false
694
- edf <- covid_case_death_rates
694
+ edf <- covid_case_death_rates |>
695
+ # Filter out locations with no deaths recorded:
696
+ group_by(geo_value) |>
697
+ filter(!all(death_rate == 0)) |>
698
+ ungroup()
695
699
head(edf |> as_tibble())
696
700
```
697
701
@@ -745,29 +749,33 @@ attr(edf, "metadata")
745
749
746
750
## Features - Correlations at different lags
747
751
752
+ Correlation coefficients:
753
+
754
+ - "Strength" and "direction" of a "relationship" between two variables
755
+ - Normalized measures of
756
+ - how well (aspects of) one variable might be estimated from another
757
+ - using particular models and metrics
758
+ - based on training errors^[ More rigorous approaches are covered tomorrow.] .
759
+
760
+ ## Features - Correlations at different lags
761
+
748
762
``` {r corr-lags-ex}
749
763
#| echo: true
750
- ## cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value)
751
- ## cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14)
752
- cor0 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, method = "kendall")
753
- cor14 <- epi_cor(edf, case_rate, death_rate, cor_by = time_value, dt1 = -14, method = "kendall")
764
+ epi_cor(edf, case_rate, death_rate, dt1 = -14, cor_by = geo_value, method = "pearson")
754
765
```
755
766
756
- ``` {r plot-corr-lags-ex}
757
- #| fig-align: center
758
- #| warning: false
759
- rbind(
760
- cor0 |> mutate(lag = 0),
761
- cor14 |> mutate(lag = 14)
762
- ) |>
763
- mutate(lag = as.factor(lag)) |>
764
- ggplot(aes(x = time_value, y = cor)) +
765
- geom_hline(yintercept = 0) +
766
- geom_line(aes(color = lag)) +
767
- scale_color_brewer(palette = "Set1") +
768
- scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
769
- labs(x = "Date", y = "Correlation", col = "Lag")
770
- ```
767
+ - For each location (` cor_by = geo_value ` ),
768
+ - how well might death rates be estimated by case rates from 14 days ago (` case_rate, death_rate, dt = -14 ` ),
769
+ - with a linear model and related error measure, and what was the sign of the cofficient (` method = "pearson" ` ),
770
+ - on this training+evaluation set (` edf ` )?
771
+
772
+ ## Features - Correlations at different lags
773
+
774
+ TODO lag analysis: Pearson by geo, then mean
775
+
776
+ ## Features - Correlations at different lags
777
+
778
+ TODO lag analysis: Kendall by time, then mean
771
779
772
780
## Features - Compute growth rates
773
781
0 commit comments