This project analyzes and visualizes dengue case data in the Philippines from 2016 to 2020. By applying K-Means clustering techniques, the project aims to identify patterns and trends in dengue cases and deaths across different regions. Understanding these patterns is crucial for preventing the spread of dengue and addressing the healthcare needs of the community effectively.
Dengue-analysis-clustering
: The main Jupyter notebook containing all steps of the analysis, including data loading, preprocessing, exploratory data analysis (EDA), clustering, and evaluation.ph_dengue_cases2016-2020.csv
: The dataset used for the analysis (ensure to include this file if sharing the dataset is permissible).
-
Data Loading and Initial Inspection:
- Load the dataset and inspect its structure and contents.
-
Data Formatting and Cleaning:
- Check for null values and data types of each column.
- Convert the 'Month' column to a categorical datatype and encode the 'Region' column.
-
Descriptive Statistics:
- Generate summary statistics to understand the dataset's distribution.
-
Trend Visualization:
- Visualize trends in dengue cases and deaths over the years to identify patterns and anomalies.
-
Clustering Analysis:
- Apply K-Means clustering to group regions based on dengue cases and deaths.
- Use the Elbow Method to determine the optimal number of clusters.
-
Cluster Visualization:
- Visualize the clusters to identify regional patterns.
-
Model Evaluation:
- Evaluate the performance of the clustering using the Calinski-Harabasz Index.
The analysis revealed distinct groupings of regions based on dengue cases and deaths. Regions with the highest cases and deaths were identified, indicating the need for extensive public health measures in these areas. The clustering results provide valuable insights for designing targeted interventions to combat dengue.
This project demonstrates the application of data analysis and clustering techniques to understand the distribution and impact of dengue cases in the Philippines. The insights derived from the analysis can help public health officials allocate resources more effectively and design targeted interventions to reduce the incidence of dengue.
- Python 3.x
- Jupyter Notebook
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
- Clone this repository:
git clone https://github.com/yourusername/dengue-cases-analysis.git