Skip to content

A repo used to train regression models on community health data to predict average life expectancy. 2023 County Health Rankings National Data is sourced from the University of Wisconsin Population Health Institute and can be found at: https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation

Notifications You must be signed in to change notification settings

bonneraj/life_expectancy_regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Life Expectancy Community Health Factor Modeling

Project Overview:

The goal of this project is to build a full modeling pipeline to predict life expectancy of United States counties based on county health data. This library leverages Pandas for data manipulation and Scikit-Learn for data pre-processing and modeling.

Getting Started:

  1. In a bash terminal run 'source ./setup.sh' to install all dependencies

Running the Pipeline, Unit Tests, and Coverage

  1. in a bash terminal, the following scripts can be run to execute the pipeline or testing/coverage: python main.py python -m pytest python -m pytest --cov=src

Dataset Overview:

2023 County Health Rankings National Data used is sourced from the University of Wisconsin Population Health Institute and can be found at: https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation

The original dataset from the link above has been adapted outside of this repo as a starting point for this pipeline and can be found at 'input/community_health_factors.csv'. The original excel-based document has been edited to remove extranneous formatting and combine multiple tabs of data before saving as a csv.

Identifying Features

FIPS identifier, state, county

Health Measures of Interest

percent_fair_or_poor_health, avg_phys_unhealthy_days, avg_mentally_unhealthy_days, percent_adults_smoking, percent_adults_w_obesity, percent_physically_inactive, percent_w_access_to_excercise, percent_excessive_drinking, percent_unemployed income_ratio , age_adj_death_rate, child_mortality_rate, infant_mortality_rate, percent_frequent_physical_distress, percent_frequent_mental_distress, percent_adults_w_diabetes, percent_food_insecure, percent_limited_access_to_healthy_foods, percent_insufficient_sleep, spending_per_pupil, school_funding_adequacy, gender_pay_gap, median_household_income, percent_enrolled_in_free_or_reduced_lunch, homicide_rate, age_adj_suicide_rate, percent_voter_turnout, percent_homeowners, percent_households_w_severe_cost_burden, percent_under_18, percent_over_65, percent_rural

Health Outcomes of Interest

life expectancy, age-adjusted death rate, suicide rate (age-adjusted), frequent physical distress, frequent mental distress

About

A repo used to train regression models on community health data to predict average life expectancy. 2023 County Health Rankings National Data is sourced from the University of Wisconsin Population Health Institute and can be found at: https://www.countyhealthrankings.org/explore-health-rankings/rankings-data-documentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published