Skip to content

This repository contains my analysis and documentation for the 2022 SPARCS (Statewide Planning and Research Cooperative System) dataset. It includes loading a portion of de-identified data, performing basic descriptive statistics and creating visualizations (healthcare trends, patient demographics, and hospital performance metrics).

Notifications You must be signed in to change notification settings

raqssoriano/sparcs_descriptive_2022

Repository files navigation

HHA507 || Module 3 || Python Assignment: Descriptive Analytics on SPARCS 2022 Data

Project Title: SPARCS Descriptive 2022


📌 This repository contains my analysis and documentation for the 2022 SPARCS (Statewide Planning and Research Cooperative System) dataset. It includes loading a portion of de-identified data, performing basic descriptive statistics and creating visualizations. This allows me to explore healthcare trends, patient demographics, and hospital performance metrics.👩🏻‍💻🏥📊

📌 Please refer to sparcsdescriptive2022.ipynb, which I created using Google Colab. It contains scripts/codes and provides easy access to data visualizations associated with the scripts and results.

📌 Almost same scripts/codes can also be found in sparcs2022.py, which I created using Python in Visual Studio Code


Steps Taken to Perform the Analysis

(1) The De-identified Data:

  1. I downloaded the dataset from health.data.ny.gov

  2. I used pandas to load a subset of data into a Python environment using my Visual Studio Code. I was able to explore data which includes Age Group, Gender, Length of Stay, Total Charges, Total Costs, and Types of Admission.

(2) Exploring Categorical Variables:

  • The distributions and trends within the following are explored:
    • Age Group
    • Gender
    • Type of Admission
    • Length of Stay
    • Total Charges
    • Total Costs
    • Type of Admission

(3) The Basic Descriptive Statistics:

  • To understand the typical patterns and variations in Length of Stay, Total Charges, and Total Costs, I calculated the following:
    • Mean
    • Median
    • Standard Deviation
    • Min/Max
    • Percentiles (25th, 50th, 75th)
    • Quartiles

(4) Data Visualization:

  • Histogram of Length of Stay: It visually represents the distribution of hospital stay durations.

  • Boxplot for Total Charges: It helps identify potential outliers in the healthcare costs.

  • Bar plot for Type of Admission: It provides insights into the relative frequencies of different reasons of admission.

(5) Handling Missing Data:

  • Check for missing data/values.
  • Remove rows with missing values.
  • Fill missing data with the mean.
  • Verify if filling the missing data is successful.

(6) Summary Report and Insights:

  • (a) Average Length of Stay: 5.72 days

  • (b) Total Cost Variation:

    • ★Total Cost Variation (Age Group)★

      • It's true that cost of care tends to increase with age, particularly between 50 and 69. This increase is largely due to the higher prevalence of chronic conditions in this age group, such as hypertension, heart disease, diabetes, pneumonia, and COPD (Chronic Obstructive Pulmonary Disease). This conditions often need more frequent medical intervention, including emergency room visits, urgent care visits, and outpatient appointments. Additionally, individuals/patients with multiple chronic illnesses may require hospitalization which may significantly contribute to higher costs of treatment. In contrast, the 0-17 age group typically experiences fewer chronic illnesses, resulting in lower overall healthcare costs.

    • ★Total Cost Variation (Type of Admission)★

      • TRAUMA admissionshighest total cost; it involves serious and life-threatening injuries, which require specialized care and may include include advanced imaging such as MRI, ultrasound, and xrays. It often involves the use of operating rooms (OR) and surgical equipment. This can result in longer stays depending on the severity of the injuries sustained by patients.

      • NEWBORN admissionslowest total cost; healthy newborns may require shorter stay in the hospital, which is likely why it gets the lowest total cost, as most can be discharged once deemed safe.

  • (c) Noticeable Trends:

    • EMERGENCY ADMISSIONS (1,394,328) - This is the highest volume of admission rates among the list within the health dataset. This could mean that acute care requires special attention, particularly in New York. It is essential to note the age group and the location or area of the contributing admission rates, as this could provide valuable insights on where the private/public health officials should allocate most of the financial support. Additionally, this data is crucial for hospitals or health systems to prepare in allocating beds, staffing, and other necessary resources to better meet the needs of the public or patients. Also, we shouldn't forget that race/ethnicity may contribute to these admission rates. Although I didn't perform a data visualization in this category, studies have shown that racial/ethnic minorities often face disparities in healthcare access and outcomes. When given the time, I will update this repository and add more insights.

    • TRAUMA ADMISSIONS (7,312) - This is the second lowest admission rate in New York within this health dataset. As a former ED nurse, I am not surprised this is one of the lowest, as not all hospitals are considered level 1 trauma centers or may not even be trauma centers. This means patients who needed specialized care due to life-threatening injuries may require transfer to trauma center hospitals.

About

This repository contains my analysis and documentation for the 2022 SPARCS (Statewide Planning and Research Cooperative System) dataset. It includes loading a portion of de-identified data, performing basic descriptive statistics and creating visualizations (healthcare trends, patient demographics, and hospital performance metrics).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published