Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two possible errors in code #5

Open
bakuninpr opened this issue Jun 3, 2018 · 9 comments
Open

Two possible errors in code #5

bakuninpr opened this issue Jun 3, 2018 · 9 comments
Assignees

Comments

@bakuninpr
Copy link

To get the number of deaths in 2016 Line 45 of excess_est reads

mutate(Sep = Sep*(1/3)) %>% sum

But to get the deaths in 2016 from September 20 to December 31, 11 days of September should be considered, not 10. Thus the line should be

mutate(Sep = Sep*(11/30)) %>% sum

Also, all 102/365 should be 103/365. The general conclusion of the article will not change, but point estimates and interval bounds will be shifted down by 70.

@davidkane9
Copy link

Nice catch!

@mkiang
Copy link
Contributor

mkiang commented Jun 3, 2018

Hi @bakuninpr, by including Sept 20, you are essentially including 0 in your counting. We can test this using r:

library(lubridate)
ymd("2017-12-31") - ymd("2017-09-20")

The above code returns this in the console:

> library(lubridate)
> ymd("2017-12-31") - ymd("2017-09-20")
Time difference of 102 days

We are counting 24 hour intervals starting Sept 20 (after the hurricane). This same logic applies to the 11/31 you mention vs the 1/3 we used:

> library(lubridate)
> ymd("2017-09-30") - ymd("2017-09-20")
Time difference of 10 days

Hope that clears that up.

@mkiang mkiang closed this as completed Jun 3, 2018
@bakuninpr
Copy link
Author

Thank you Mathew for your response. The storm made landfall early September 20th. The survey asks respondents if people died before or after Hurricane Maria (supplement I, 3b). Table S6 summarizes deaths before September 20, and after September 20. Since the storm arrived September 20, I assume (as respondents would) that 'after Hurricane Maria' meant `during or after': including September 20. So to get the difference in the number of deaths with the previous year, you shouldn't do it by counting 24 hour periods starting September 20, 2016. Otherwise you miss the deaths that happened on that first day and inflate the estimated HM death toll.
Here's an example where we want to count 'deaths' from the second day on (obviously it should be 16).

x<-c(4,4,4,4,4)

sum(x)*4/5 #how it should be
[1] 16
sum(x)*3/5 #how the NEJM code does it.
[1] 12
Again, this will inflate the difference by eliminating deaths representative of September 20, 2016.

@mkiang mkiang reopened this Jun 3, 2018
@mkiang
Copy link
Contributor

mkiang commented Jun 3, 2018

Hi @bakuninpr, I've asked somebody more senior to comment on this because I'm not sure I follow. I don't assume to know if the respondents understood the question to be "during or after", but the paper through uses "after the hurricane" — perhaps another author will have more insight.

@rafalab
Copy link
Collaborator

rafalab commented Jun 3, 2018

@bakuninpr,

Thanks for your comment.

According to NOAA, the hurricane hit Vieques at 5Am, the east coast or Puerto Rico at 6.15AM, and the northwest coast at around 2pm. So the number of days of the "exposure" is somewhere between 102 and 103. As @mkiang points out, we decided to count whole days and used 102. You make a reasonable argument for using 103. For now, we will keep the code as is so that it produces plots and tables that match those in the paper. Your comment will serve as a public record that shows how the estimate changes if one uses 103.

Thanks again for posting this.

@davidkane9
Copy link

the hurricane hit Vieques at 5 AM

So, the leading winds hit hours before that. Assume that someone died on Vieques at 4:00 AM from the storm. Assume that this person's household was surveyed and that the death were reported. Then, obviously, you would count this death as being caused by Maria! You have to!

So, if storm-related deaths on Sept 20 would have been included in the 38, then you have to include Sept 20 from 2016 to calculate excess deaths.

Now, if the data broke down the the deaths from September 20, 2016 by the hour, you could do an apples-to-apples by excluding deaths from the very early morning hours to make it a fair comparison. But you only (AFAIK) have monthly data.

@rafalab
Copy link
Collaborator

rafalab commented Jun 3, 2018

As stated before, your comment will serve as a public record that shows how the estimate changes if one uses 103 instead of 102. But note that we do not annotate deaths as being "caused by Maria" nor do we tally them as being "storm-related". In the survey, we asked if the death happened before or after the hurricane. When computing the after-the-hurricane-rate one needs to divide by the number of days that, on average, people surveyed consider to be after the hurricane. This number is between 102 and 103.

@davidkane9
Copy link

we do not annotate deaths as being "caused by Maria"

Huh? Page 8, Figure 4 B is titled "Reported Causes of Death", is subtitled "Cause of Death" and includes the category "Directly related to hurricane". Clearly, you have determined that those deaths were "caused by Maria". Or am I missing something?

@rafalab
Copy link
Collaborator

rafalab commented Jun 4, 2018

This discussion relates to computing the estimates of rate and excess number. I should have clarified: When computing the rate and excess number, we do not annotate deaths as being "caused by Maria".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants