Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens with smooth.Rt and gaps in data ? #8

Open
tobadia opened this issue Nov 16, 2020 · 0 comments
Open

What happens with smooth.Rt and gaps in data ? #8

tobadia opened this issue Nov 16, 2020 · 0 comments

Comments

@tobadia
Copy link
Owner

tobadia commented Nov 16, 2020

From a recent e-mail :

Greetings!

I am Chris Swenson, a data scientist in the United States, working for a multi-state, non-profit hospital system called SSM Health. We have been using the R0 package in R for the past few months, and I have a question about how the smooth.Rt function works.

We’ve been calculating the R(0) / R(t) for specific regions where we have hospitals for the past few months, to alert the infection control staff of potential surges in COVID-19 cases. I’ve been using the data from Johns Hopkins to supply the county-level regional data as inputs to the estimation. (We’ve had to do some data corrections. For examples, some locations appear to skip weekends and reporting very high numbers on Mondays.)

It appears that if we’re not careful handling the data, some unexpected results may occur. There have been some situations where the data was not entered correctly, and it appears that the smooth.Rt function will exclude any incomplete periods (e.g., 7-day periods, in our case). Below is some code where I ran the estimate and smoothing for the data, removed 1 day and estimated, removed 1 week and estimated, and compared all three estimates. I also included the final table, and since the data is included in R, you may run the code to compare.

My question is: What happens in the smooth.Rt function when a period is incomplete, for example, the input data only has 5 days instead of 7 in a week? It appears the most recent week of data generates a much smaller estimate after smoothing. Should we be cautious when using the most recent data?

Provided replicable example

mGT <- generation.time("gamma", c(3,1.5))
 
TD <- estimate.R(Germany.1918, mGT, begin=as.integer(1), end=as.integer(length(Germany.1918)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
init <- TD.weekly$R
#TD.weekly
print(paste('Original: ', as.character(length(init))))
 
len <- length(Germany.1918)-1
test <- Germany.1918[1:len]
TD <- estimate.R(test, mGT, begin=as.integer(1), end=as.integer(length(test)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
new <- TD.weekly$R
#TD.weekly
print(paste('One Less Row: ', length(new)))
 
len <- length(Germany.1918)-7
test2 <- Germany.1918[1:len]
TD <- estimate.R(test2, mGT, begin=as.integer(1), end=as.integer(length(test2)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
new2 <- TD.weekly$R
#TD.weekly
print(paste('One Less Row: ', length(new2)))
 
df_init_full <- as.data.frame(init)
df_init <- as.data.frame(init)[1:length(init)-1,]
df_new <- as.data.frame(new)
df_new2 <- as.data.frame(new2)
df_compare <- cbind(df_init, df_new, df_new2)

Comparison of outputs

  Full Data Data Missing 1 Day Data Missing 1 Week
1 1.8784 1.8784 1.8784
2 1.5810 1.5810 1.5810
3 1.3569 1.3569 1.3569
4 1.1316 1.1316 1.1316
5 0.9615 0.9615 0.9615
6 0.8119 0.8119 0.8119
7 0.8045 0.8045 0.8045
8 0.8396 0.8396 0.8396
9 0.8543 0.8543 0.8543
10 0.8258 0.8258 0.8258
11 0.8544 0.8544 0.8544
12 0.9776 0.9776 0.9776
13 0.9517 0.9517 0.9517
14 0.9273 0.9273 0.9273
15 0.9635 0.9635 0.9635
16 0.9509 0.9509 0.9481
17 0.9827 0.9843 0.4994
18 0.5844    

Note: I manually added week 18 from the original estimation for comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant