Skip to content

pmb-7684/New-York-City-Leading-Causes-of-Death

Repository files navigation

NYC Leading Causes of Death

Paula Bailey 2022-10-14

Background

Looking at my profile, one may believe two things about me - 1. I love large cities like Chicago and New York. 2. I’m only interested in health data set. I haven’t been to either city since I was a child; however, both cities have wealth of open source data. The city of Chicago has their portal. I just discovered New York City’s NYCOpenData. This site also has an API, so I plan to practice accessing APIs in the near future. As for the second question, it’s more likely because I can find more data on the subject of health and crime.

This data comes from the NYC OpenData website. It is on the leading causes of death in New York City by sex and ethnicity since 2007. The cause of death is derived from the NYC death certificate which is issued for every death that occurs in the city”.

After I complete this project, I will be completing it again in python. Currently, I am in school and R is the language of choice. But I realize, it is important to know python….practice…practice…

Ask

This is not a formal project, so there is not a detailed business task (or statement). I will use the data set to explore, complete some summary statistics and visualizations.

I have the following questions. What was the cause of death for women in NYC? What about the top 10 causes? Does race change the top cause for women? Basically, is it different for Black, White, Hispanic, and Asian women?

Prepare

This csv file is named New_York_City_leading_Causes_of_Death. It contains 1,272 rows and 7 features where each row is a cause of death. It is contained on the NYC OpenData platform where it is updated annually by Department of Health and Mental Hygiene (DOHMH). It was last updated on February 8, 2022. The data is from 2007 to 2019.

The csv has the following features:

column Name description definitions
ĂŻ..Year Year of Death year
leading_cause the unique cause of death Leading Cause
Sex Sex of decedent
race/ethnicity Race of decedent
Deaths Number of People who died due to this cause
death_rate Death rate within sex and race category
age_adjusted_death_rate Age adjusted death rate within sex and race Age Adjusted Death Rate

Based on the information from the website all of the variables are in plain text. So, we will need to verify the types of each column after we import the data.

The data and metadata can be found here.

After downloading the file, I renamed it to preserve the original file. During the data cleaning (process), any changes to the data will be saved under a file named ChangeLog.doc

Process

library(tidyverse)

First, we will read in the csv file.

nycCauses <- read.csv("New_York_City_Leading_Causes_of_Death_pmb.csv", na.strings=c("",".","NA"))
head(nycCauses)
##   ĂŻ..Year
## 1    2019
## 2    2019
## 3    2019
## 4    2019
## 5    2019
## 6    2019
##                                                                                                                       Leading.Cause
## 1                                                                                    Diseases of Heart (I00-I09, I11, I13, I20-I51)
## 2                                                                                             Malignant Neoplasms (Cancer: C00-C97)
## 3 Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substance Use (F11-F16, F18-F19, X40-X42, X44)
## 4                                                                                                       Diabetes Mellitus (E10-E14)
## 5                                                                                           Influenza (Flu) and Pneumonia (J09-J18)
## 6                                                                                         Cerebrovascular Disease (Stroke: I60-I69)
##    Sex Race.Ethnicity Deaths Death.Rate Age.Adjusted.Death.Rate
## 1 Male       Hispanic   1603  136.80299               176.78329
## 2 Male       Hispanic   1164   99.33792               121.58177
## 3 Male       Hispanic    423   36.09960                35.70790
## 4 Male       Hispanic    245   20.90875                25.40934
## 5 Male       Hispanic    182   15.53222                19.85821
## 6 Male       Hispanic    182   15.53222                19.55838

Next, the structure of the file.

str(nycCauses)
## 'data.frame':    1272 obs. of  7 variables:
##  $ ĂŻ..Year                : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
##  $ Leading.Cause          : chr  "Diseases of Heart (I00-I09, I11, I13, I20-I51)" "Malignant Neoplasms (Cancer: C00-C97)" "Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substance Use (F11-F16, F18-"| __truncated__ "Diabetes Mellitus (E10-E14)" ...
##  $ Sex                    : chr  "Male" "Male" "Male" "Male" ...
##  $ Race.Ethnicity         : chr  "Hispanic" "Hispanic" "Hispanic" "Hispanic" ...
##  $ Deaths                 : int  1603 1164 423 245 182 182 174 171 170 135 ...
##  $ Death.Rate             : num  136.8 99.3 36.1 20.9 15.5 ...
##  $ Age.Adjusted.Death.Rate: num  176.8 121.6 35.7 25.4 19.9 ...

As mentioned above, everything is a character format except for year, including deaths, death.rate, and the adjusted death rate.

I think we should make the following data changes:

  • Leading Cause, Sex, and Race.Ethnicity from CHR to FACTOR
  • Leading Cause, Sex, and Race.Ethnicity from CHR to FACTOR
  • death rates and age adjusted deaths rates from CHR to DOUBLE
  • Change ĂŻ..Year to year (column name)

After our data type changes, our data sets will contain 3 continuous features, 1 discrete feature and 3 categorical features.

Rename column ĂŻ..Year

nycCauses <- nycCauses %>% rename(year=ĂŻ..Year)

Before changing data types, let’s look for any NAs.

During this process, I ran into Warning: NAs introduced by coercionWarning which indicates the data set has missing information (empty not NA). Hmm, that is interesting. I was not expecting to see NAs. I resolved this issue by adding na.strings=c("",".","NA") during import.

colSums(is.na(nycCauses))
##                    year           Leading.Cause                     Sex          Race.Ethnicity 
##                       0                       0                       0                       0 
##                  Deaths              Death.Rate Age.Adjusted.Death.Rate 
##                     138                     453                     453

After reviewing the data set, we can see which descendants have NAs.

  • Descendants with “Other” as race do not have Death rate and age adjusted death rate.
  • The remaining 138 descendants with missing deaths are a combination of with “other Race” and “Not Stated” as race.

Based on the above, I will create a data set descendants with missing deaths removed. We will name that data frame as nycCauses_Deaths. I will try to complete most of the analysis with that data set. Recall the original data set is 1,272 observations and now we have 1,134. The difference is 138.

nycCauses_Deaths <- nycCauses %>% drop_na(Deaths)

Change varaibles Leading causes, sex, and race to factors. Change deaths, death,rate and age adjusted death rate to integer and double, respectively.

# update to factors
nycCauses_Deaths$Leading.Cause = factor(nycCauses_Deaths$Leading.Cause)
nycCauses_Deaths$Sex = factor(nycCauses_Deaths$Sex)
nycCauses_Deaths$Race.Ethnicity = factor(nycCauses_Deaths$Race.Ethnicity)

# update to integer and double
nycCauses_Deaths$Deaths = as.integer(nycCauses_Deaths$Deaths)
nycCauses_Deaths$Death.Rate = as.double(nycCauses_Deaths$Death.Rate)
nycCauses_Deaths$Age.Adjusted.Death.Rate = as.double(nycCauses_Deaths$Age.Adjusted.Death.Rate)

We will also create a second data set namednycCause_noNA, where we remove all NAs which will have 819 observations.

nycCauses_noNA   <- nycCauses_Deaths %>% drop_na(Death.Rate, Age.Adjusted.Death.Rate)

Let’s confirm our changes

str(nycCauses_Deaths)
## 'data.frame':    1134 obs. of  7 variables:
##  $ year                   : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
##  $ Leading.Cause          : Factor w/ 32 levels "Accidents Except Drug Poisoning (V01-X39, X43, X45-X59, Y85-Y86)",..: 18 25 26 17 21 9 1 12 14 19 ...
##  $ Sex                    : Factor w/ 4 levels "F","Female","M",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Race.Ethnicity         : Factor w/ 8 levels "Asian and Pacific Islander",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Deaths                 : int  1603 1164 423 245 182 182 174 171 170 135 ...
##  $ Death.Rate             : num  136.8 99.3 36.1 20.9 15.5 ...
##  $ Age.Adjusted.Death.Rate: num  176.8 121.6 35.7 25.4 19.9 ...
nycCauses_Deaths %>% select(Deaths, Death.Rate, Age.Adjusted.Death.Rate) %>% summary()
##      Deaths         Death.Rate     Age.Adjusted.Death.Rate
##  Min.   :   1.0   Min.   :  2.40   Min.   :  2.50         
##  1st Qu.:  29.0   1st Qu.: 11.95   1st Qu.: 12.00         
##  Median : 136.5   Median : 18.50   Median : 20.00         
##  Mean   : 422.9   Mean   : 53.52   Mean   : 53.21         
##  3rd Qu.: 291.5   3rd Qu.: 66.07   3rd Qu.: 77.90         
##  Max.   :7050.0   Max.   :491.40   Max.   :414.59         
##                   NA's   :315      NA's   :315
nycCauses_noNA %>% select(Deaths, Death.Rate, Age.Adjusted.Death.Rate) %>% summary()
##      Deaths         Death.Rate     Age.Adjusted.Death.Rate
##  Min.   :   5.0   Min.   :  2.40   Min.   :  2.50         
##  1st Qu.: 102.0   1st Qu.: 11.95   1st Qu.: 12.00         
##  Median : 207.0   Median : 18.50   Median : 20.00         
##  Mean   : 577.3   Mean   : 53.52   Mean   : 53.21         
##  3rd Qu.: 472.5   3rd Qu.: 66.07   3rd Qu.: 77.90         
##  Max.   :7050.0   Max.   :491.40   Max.   :414.59

Let’s look at the distinct values for our character values - Leading_Cause, Sex and Race.Ethnicity. I noticed in the str() above for sex, there are four factors listed. I want to make sure those factors are unique.

n_distinct(nycCauses_Deaths$year)
## [1] 9
n_distinct(nycCauses_Deaths$Sex)
## [1] 4
n_distinct(nycCauses_Deaths$Leading.Cause)
## [1] 32
n_distinct(nycCauses_Deaths$Race.Ethnicity)
## [1] 8
unique(nycCauses_Deaths[["Sex"]])
## [1] Male   Female M      F     
## Levels: F Female M Male

Wow…As we can see,it’s a good thing that we looked. We have Male, M, Female and F. So, we will change the factor of F and M to Female and Male. Since we already created nycCauses_noNA, we will update that files as well.

levels(nycCauses_Deaths$Sex)[levels(nycCauses_Deaths$Sex)=='M'] <- 'Male'
levels(nycCauses_Deaths$Sex)[levels(nycCauses_Deaths$Sex)=='F'] <- 'Female'

levels(nycCauses_noNA$Sex)[levels(nycCauses_noNA$Sex)=='M'] <- 'Male'
levels(nycCauses_noNA$Sex)[levels(nycCauses_noNA$Sex)=='F'] <- 'Female'
unique(nycCauses_Deaths[["year"]])
## [1] 2019 2014 2013 2007 2009 2008 2011 2012 2010

We notice that we are missing data from 2015 - 2018. I’m not sure why we are missing data. The data dictionary in the Cause_of_Death_121412.csv file states years from 2007 - 2016 and the title gives the impression we will have data from 2007. We have an discrepancy, but we will carry on.

Look at factors for ethnicity.

unique(nycCauses_Deaths[["Race.Ethnicity"]])
## [1] Hispanic                   Asian and Pacific Islander Non-Hispanic White        
## [4] Non-Hispanic Black         Other Race/ Ethnicity      Not Stated/Unknown        
## [7] White Non-Hispanic         Black Non-Hispanic        
## 8 Levels: Asian and Pacific Islander Black Non-Hispanic Hispanic Non-Hispanic Black ... White Non-Hispanic

As a best practice, it is good to check all levels of features with factors. Above we have both Black Non-Hispanic and Non-Hispanic Black and White Non-Hispanic and Non-Hispanic White, We need to address this so it will not effect our work.

levels(nycCauses_Deaths$Race.Ethnicity)[levels(nycCauses_Deaths$Race.Ethnicity)=='Non-Hispanic Black'] <- 'Black Non-Hispanic'
levels(nycCauses_Deaths$Race.Ethnicity)[levels(nycCauses_Deaths$Race.Ethnicity)=='Non-Hispanic White'] <- 'White Non-Hispanic'


levels(nycCauses_noNA$Race.Ethnicity)[levels(nycCauses_noNA$Race.Ethnicity)=='Non-Hispanic Black'] <- 'Black Non-Hispanic'
levels(nycCauses_noNA$Race.Ethnicity)[levels(nycCauses_noNA$Race.Ethnicity)=='Non-Hispanic White'] <- 'White Non-Hispanic'

Check to confirm our changes

unique(nycCauses_Deaths[["Sex"]])
## [1] Male   Female
## Levels: Female Male
unique(nycCauses_Deaths[["Race.Ethnicity"]])
## [1] Hispanic                   Asian and Pacific Islander White Non-Hispanic        
## [4] Black Non-Hispanic         Other Race/ Ethnicity      Not Stated/Unknown        
## 6 Levels: Asian and Pacific Islander Black Non-Hispanic Hispanic White Non-Hispanic ... Other Race/ Ethnicity

Now our data sets seem to be clean. As mentioned above we will use nycCauses_Deaths and nycCauses_noNA. We will mainly use nycCauses_Deaths, because we were able to maintain the most data.

Analyze

Question 1 - For black women who died in 2019, how many women died from heart disease or diabetes?

nycCauses_Deaths %>% 
  group_by(year, Leading.Cause, Sex, Race.Ethnicity) %>% 
  summarise(mean = mean(Deaths)) %>%
  filter(year == 2019 & Sex == "Female" & grepl('Heart|Diabetes', Leading.Cause) 
         & Race.Ethnicity =='Black Non-Hispanic')
## # A tibble: 2 x 5
## # Groups:   year, Leading.Cause, Sex [2]
##    year Leading.Cause                                  Sex    Race.Ethnicity      mean
##   <int> <fct>                                          <fct>  <fct>              <dbl>
## 1  2019 Diabetes Mellitus (E10-E14)                    Female Black Non-Hispanic   383
## 2  2019 Diseases of Heart (I00-I09, I11, I13, I20-I51) Female Black Non-Hispanic  2483

In 2019, on average 383 black women died from diabetes and 2,483 died from heart disease.

Question 2 - In 2019, what is the top causes of death for women?

  nycCauses_Deaths %>% 
        group_by(year, Leading.Cause, Sex) %>% 
        summarise(average = mean(Deaths)) %>%
        filter(year == 2019 & Sex == "Female") %>%
        arrange(desc(average)) %>%
        select(Leading.Cause, average) %>%
        head(10)
## # A tibble: 10 x 3
## # Groups:   year, Leading.Cause [10]
##     year Leading.Cause                                                                                average
##    <int> <fct>                                                                                          <dbl>
##  1  2019 Diseases of Heart (I00-I09, I11, I13, I20-I51)                                                1266. 
##  2  2019 Malignant Neoplasms (Cancer: C00-C97)                                                          898. 
##  3  2019 All Other Causes                                                                               804. 
##  4  2019 Chronic Lower Respiratory Diseases (J40-J47)                                                   159  
##  5  2019 Cerebrovascular Disease (Stroke: I60-I69)                                                      155. 
##  6  2019 Alzheimer's Disease (G30)                                                                      133. 
##  7  2019 Diabetes Mellitus (E10-E14)                                                                    129. 
##  8  2019 Influenza (Flu) and Pneumonia (J09-J18)                                                        110. 
##  9  2019 Essential Hypertension and Renal Diseases (I10, I12)                                           107. 
## 10  2019 Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substanc~    59.7

The top cause of death for women is again heart disease with an average of 1,266 deaths.

Question 3 - Question two leads us to ask, Is heart disease the leading cause of death for all race/ethnicity in NYC in 2019? Is it a leading cause by race? We will look at Black, White, Hispanic, Asian and Pacific Islander and Other Race.

nycCauses_Deaths %>% 
      group_by(year, Race.Ethnicity,Leading.Cause) %>% 
      filter(year == 2019 & Sex == "Female" & Race.Ethnicity == 'Black Non-Hispanic') %>%
      summarise(average = mean(Deaths)) %>%
      arrange(desc(average)) %>%
      select(Race.Ethnicity, Leading.Cause, average) 
## `summarise()` has grouped output by 'year', 'Race.Ethnicity'. You can override using the `.groups` argument.
## Adding missing grouping variables: `year`

## # A tibble: 11 x 4
## # Groups:   year, Race.Ethnicity [1]
##     year Race.Ethnicity     Leading.Cause                                                             average
##    <int> <fct>              <fct>                                                                       <dbl>
##  1  2019 Black Non-Hispanic Diseases of Heart (I00-I09, I11, I13, I20-I51)                               2483
##  2  2019 Black Non-Hispanic Malignant Neoplasms (Cancer: C00-C97)                                        1753
##  3  2019 Black Non-Hispanic All Other Causes                                                             1578
##  4  2019 Black Non-Hispanic Diabetes Mellitus (E10-E14)                                                   383
##  5  2019 Black Non-Hispanic Cerebrovascular Disease (Stroke: I60-I69)                                     316
##  6  2019 Black Non-Hispanic Essential Hypertension and Renal Diseases (I10, I12)                          290
##  7  2019 Black Non-Hispanic Chronic Lower Respiratory Diseases (J40-J47)                                  242
##  8  2019 Black Non-Hispanic Influenza (Flu) and Pneumonia (J09-J18)                                       227
##  9  2019 Black Non-Hispanic Alzheimer's Disease (G30)                                                     171
## 10  2019 Black Non-Hispanic Mental and Behavioral Disorders due to Accidental Poisoning and Other Ps~     120
## 11  2019 Black Non-Hispanic Nephritis, Nephrotic Syndrome and Nephrisis (N00-N07, N17-N19, N25-N27)        96

The top cause of death for Black females in 2019 was Diseases of Heart (I00-I09, I11, I13, I20-I51) with an average of 2,483 women dying. The second highest is Malignant Neoplasms (Cancer: C00-C97) with 1,753 cases.

nycCauses_Deaths %>% 
      group_by(year, Race.Ethnicity,Leading.Cause) %>% 
      filter(year == 2019 & Sex == "Female" & Race.Ethnicity == 'White Non-Hispanic') %>%
      summarise(average = mean(Deaths)) %>%
      arrange(desc(average)) %>%
      select(Race.Ethnicity, Leading.Cause, average) 
## `summarise()` has grouped output by 'year', 'Race.Ethnicity'. You can override using the `.groups` argument.
## Adding missing grouping variables: `year`

## # A tibble: 11 x 4
## # Groups:   year, Race.Ethnicity [1]
##     year Race.Ethnicity     Leading.Cause                                                             average
##    <int> <fct>              <fct>                                                                       <dbl>
##  1  2019 White Non-Hispanic Diseases of Heart (I00-I09, I11, I13, I20-I51)                               4001
##  2  2019 White Non-Hispanic Malignant Neoplasms (Cancer: C00-C97)                                        2655
##  3  2019 White Non-Hispanic All Other Causes                                                             2264
##  4  2019 White Non-Hispanic Chronic Lower Respiratory Diseases (J40-J47)                                  480
##  5  2019 White Non-Hispanic Cerebrovascular Disease (Stroke: I60-I69)                                     390
##  6  2019 White Non-Hispanic Alzheimer's Disease (G30)                                                     341
##  7  2019 White Non-Hispanic Influenza (Flu) and Pneumonia (J09-J18)                                       308
##  8  2019 White Non-Hispanic Essential Hypertension and Renal Diseases (I10, I12)                          228
##  9  2019 White Non-Hispanic Diabetes Mellitus (E10-E14)                                                   197
## 10  2019 White Non-Hispanic Accidents Except Drug Poisoning (V01-X39, X43, X45-X59, Y85-Y86)              167
## 11  2019 White Non-Hispanic Mental and Behavioral Disorders due to Accidental Poisoning and Other Ps~     125

For Non-Hispanic White women, the number one cause is Diseases of Heart with average 4,001 deaths. This is followed by Cancer and Chronic Lower Respiratory Diseases.

nycCauses_Deaths %>% 
      group_by(year, Race.Ethnicity,Leading.Cause) %>% 
      filter(year == 2019 & Sex == "Female" & Race.Ethnicity == 'Hispanic') %>%
      summarise(average = mean(Deaths)) %>%
      arrange(desc(average)) %>%
      select(Race.Ethnicity, Leading.Cause, average) 
## `summarise()` has grouped output by 'year', 'Race.Ethnicity'. You can override using the `.groups` argument.
## Adding missing grouping variables: `year`

## # A tibble: 11 x 4
## # Groups:   year, Race.Ethnicity [1]
##     year Race.Ethnicity Leading.Cause                                                                 average
##    <int> <fct>          <fct>                                                                           <dbl>
##  1  2019 Hispanic       Diseases of Heart (I00-I09, I11, I13, I20-I51)                                   1574
##  2  2019 Hispanic       All Other Causes                                                                 1279
##  3  2019 Hispanic       Malignant Neoplasms (Cancer: C00-C97)                                            1182
##  4  2019 Hispanic       Cerebrovascular Disease (Stroke: I60-I69)                                         249
##  5  2019 Hispanic       Alzheimer's Disease (G30)                                                         231
##  6  2019 Hispanic       Diabetes Mellitus (E10-E14)                                                       214
##  7  2019 Hispanic       Chronic Lower Respiratory Diseases (J40-J47)                                      178
##  8  2019 Hispanic       Influenza (Flu) and Pneumonia (J09-J18)                                           156
##  9  2019 Hispanic       Essential Hypertension and Renal Diseases (I10, I12)                              151
## 10  2019 Hispanic       Mental and Behavioral Disorders due to Accidental Poisoning and Other Psycho~     104
## 11  2019 Hispanic       Accidents Except Drug Poisoning (V01-X39, X43, X45-X59, Y85-Y86)                   98

For Hispanic women, the number one cause of death is Diseases of Heart with average 1,574 deaths. This is followed by Cancer and Cerebrovascular Disease.

nycCauses_Deaths %>% 
      group_by(year, Race.Ethnicity,Leading.Cause) %>% 
      filter(year == 2019 & Sex == "Female" & Race.Ethnicity == 'Asian and Pacific Islander') %>%
      summarise(average = mean(Deaths)) %>%
      arrange(desc(average)) %>%
      select(Race.Ethnicity, Leading.Cause, average) 
## `summarise()` has grouped output by 'year', 'Race.Ethnicity'. You can override using the `.groups` argument.
## Adding missing grouping variables: `year`

## # A tibble: 11 x 4
## # Groups:   year, Race.Ethnicity [1]
##     year Race.Ethnicity             Leading.Cause                                                    average
##    <int> <fct>                      <fct>                                                              <dbl>
##  1  2019 Asian and Pacific Islander Diseases of Heart (I00-I09, I11, I13, I20-I51)                       623
##  2  2019 Asian and Pacific Islander Malignant Neoplasms (Cancer: C00-C97)                                582
##  3  2019 Asian and Pacific Islander All Other Causes                                                     401
##  4  2019 Asian and Pacific Islander Cerebrovascular Disease (Stroke: I60-I69)                            110
##  5  2019 Asian and Pacific Islander Diabetes Mellitus (E10-E14)                                           78
##  6  2019 Asian and Pacific Islander Influenza (Flu) and Pneumonia (J09-J18)                               67
##  7  2019 Asian and Pacific Islander Essential Hypertension and Renal Diseases (I10, I12)                  66
##  8  2019 Asian and Pacific Islander Alzheimer's Disease (G30)                                             50
##  9  2019 Asian and Pacific Islander Accidents Except Drug Poisoning (V01-X39, X43, X45-X59, Y85-Y86)      40
## 10  2019 Asian and Pacific Islander Chronic Lower Respiratory Diseases (J40-J47)                          40
## 11  2019 Asian and Pacific Islander Intentional Self-Harm (Suicide: U03, X60-X84, Y87.0)                  40

For Asian and Pacific Islander women, the number one cause of death is Diseases of Heart with average 623 deaths. This is followed by Cancer and Cerebrovascular Disease.

nycCauses_Deaths %>% 
      group_by(year, Race.Ethnicity,Leading.Cause) %>% 
      filter(year == 2019 & Sex == "Female" & Race.Ethnicity == 'Other Race/ Ethnicity') %>%
      summarise(average = mean(Deaths)) %>%
      arrange(desc(average)) %>%
      select(Race.Ethnicity, Leading.Cause, average) 
## `summarise()` has grouped output by 'year', 'Race.Ethnicity'. You can override using the `.groups` argument.
## Adding missing grouping variables: `year`

## # A tibble: 23 x 4
## # Groups:   year, Race.Ethnicity [1]
##     year Race.Ethnicity        Leading.Cause                                                    average
##    <int> <fct>                 <fct>                                                              <dbl>
##  1  2019 Other Race/ Ethnicity Diseases of Heart (I00-I09, I11, I13, I20-I51)                      34.5
##  2  2019 Other Race/ Ethnicity Malignant Neoplasms (Cancer: C00-C97)                               26  
##  3  2019 Other Race/ Ethnicity All Other Causes                                                    22.5
##  4  2019 Other Race/ Ethnicity Diabetes Mellitus (E10-E14)                                          7.5
##  5  2019 Other Race/ Ethnicity Cerebrovascular Disease (Stroke: I60-I69)                            5.5
##  6  2019 Other Race/ Ethnicity Essential Hypertension and Renal Diseases (I10, I12)                 3.5
##  7  2019 Other Race/ Ethnicity Influenza (Flu) and Pneumonia (J09-J18)                              3.5
##  8  2019 Other Race/ Ethnicity Alzheimer's Disease (G30)                                            3  
##  9  2019 Other Race/ Ethnicity Assault (Homicide: U01-U02, Y87.1, X85-Y09)                          3  
## 10  2019 Other Race/ Ethnicity Accidents Except Drug Poisoning (V01-X39, X43, X45-X59, Y85-Y86)     2  
## # ... with 13 more rows

For women who identified as other race, Diseases of Heart is also the top cause of death with average 623 deaths. This is followed by Cancer and Diabetes Mellitus.

Share

This plot shows that heart disease has been trending down since 2007 as the cause of death for women. However, it still seems to +be the number one cause of death for women.

nycCauses_Deaths %>% 
        filter(Sex == "Female" & Leading.Cause == 'Diseases of Heart (I00-I09, I11, I13, I20-I51)') %>%
  
       ggplot( aes(x=year, y = Deaths)) + 
        theme(axis.text.x=element_text(angle=50, hjust=1)) +
        geom_bar(stat = "identity")

I have added images of two additional plots that are contained in a file named Tableau_images.doc. Those plots show heart disease has been declining since 2007, but continues to be the number one cause of death for men and women.

Conclusion

As mentioned, I plan to begin the project using python; however this time I will look at the causes of death for men. I will also be using the data to complete plots (charts) in Tableau. Again, another way to practice.

In the meantime, we can see that heart disease has been the leading cause of death for women in the NYC area. This is regardless their ethnicity.

Releases

No releases published

Packages

No packages published

Languages