This Github repository contains code and data to reproduce results in the article:
Robert Kubinec, Luiz Max Carvalho, Joan Barceló, Cindy Cheng, Luca Messerschmidt and Matthew Sean Cottrell. "A Bayesian latent variable model for the optimal identification of disease incidence rates given information constraints." Journal of the Royal Statistical Society Series A: Statistics in Society. 2024. https://doi.org/10.1093/jrsssa/qnae040 .
A brief description of the files is found below. If you have any questions about the information in the repo, please contact Bob Kubinec at [email protected]
.
First, note that the paper relies on a fitted cmdstanr
model to reproduce results. These model fits are too big to store on Github, but you can access them from this Google drive folder and place them in the data
sub-folder to reproduce results without fitting models (may take up to a few days):
https://drive.google.com/drive/folders/1hVzD_qL1CnOkTkwI6VH1PEgC1LK44RS1?usp=sharing
-
kubinec_model_preprint.Rmd
: This file contains the text and embedded R code to reproduce the figures and tables in the paper. -
kubinec_model_SI.Rmd
: This file contains the text and embedded R code to reproduce the supplementary information.
-
corona_tscs_betab_mix_prior_v2.stan
This Stan file contains the code to fit the model described in the paper using Stan (specifically,cmdstan
accessed via thecmdstanr
package). See code in thekubinec_model_preprint.Rmd
file to see how to fit the model from R. -
estimate_beta_priors_v2.stan
: This Stan file calculates the uncertainty of the empirical distributions of the estimates of the expert survey about COVID-19 incidence in the early pandemic period.
-
data/combined.rds
: the combined dataset with COVID-19 cases, tests, Census data and expert and serology survey data -
nyt_data.rds
,goog_mobile.rds
andtests.rds
: New York Times (reported cases), Google mobility data and testing data for the time period described in the paper. Note that the paper code can download these from Github repositories, but these sources may no longer be available. -
data/simulation/
: contains masking and Civiqs polls about COVID-19 related fears and behaviors. -
data/covid_amp_state_policy_data.xlsx
: contains COVID-AMP state-level policy data as described in the paper -
count_pol_covidamp.rds
: Aggregated form of COVID-AMP data to the state level as a count of policies. -
cdc_sample_sizes.csv
: CDC serology surveys -
data/consensusForecastsDB.csv"
: expert survey of epidemiologists during the early pandemic period -
data/rhat_summaries*.rds
Rhat summaries for different models as reported in the paper.