Skip to content

yourarnav/CS699-SW-Lab

Repository files navigation

New Image

Strokes Uncovered Python License GitHub commit activity

Contributors πŸ§‘β€πŸ’»

  • Anuj Attri (23M0808) πŸ‘¨β€πŸŽ“
  • Arnav Attri (23M0811) πŸ‘¨β€πŸŽ“

Welcome to 🧠 Strokes Uncovered: Data Analysis, Visualization, and Predictive Insights – your gateway to unraveling the mysteries behind strokes! 🧐

Introduction πŸš€

The project, titled Strokes Uncovered: Data Analysis, Visualization, and Predictive Insights, is dedicated to conducting exploratory data analysis (EDA), which encompasses various techniques such as histograms, scatter plots, bar charts, and heatmaps. Additionally, it involves data visualization and predictive modeling using a publicly available dataset. Strokes represent a significant global health concern, accounting for approximately 11% of worldwide deaths, as reported by the World Health Organization (WHO). The primary objective of this project is to gain a deeper understanding of the risk factors that influence stroke and to facilitate more effective preventive measures and early interventions.

About the Dataset πŸ“Š

The dataset consists of more than 5000 data points and has 10 input features such as (age, hypertension, heart disease, martial status, work type, residence type, average glucose level, BMI, smoking status, gender).

Objectives πŸ“‹

  1. Data Analysis: The project will start with a comprehensive data analysis to uncover insights into attribute distributions and relationships. It will address specific questions and hypotheses using statistical methods such as descriptive statistics and hypothesis testing:

    • What is the gender distribution in the dataset, and does it impact stroke likelihood?
    • Is there a correlation between patient age and stroke risk?
    • Does residence type significantly influence stroke risk?
    • Are married individuals more or less likely to experience strokes compared to unmarried individuals in the dataset?
  2. Data Visualization: The project will create informative visualizations using Python's Matplotlib and Plotly libraries to effectively convey dataset characteristics, relationships, and trends:

    • Age, glucose levels, and BMI will be depicted through histograms and density plots.
    • Gender distribution and marital status will be visualized using bar charts.
    • Scatter plots will explore attribute relationships with stroke risk.
    • An interactive Plotly visualization will offer dynamic dataset exploration.
  3. Stroke Prediction Model: In this phase of the project, we will build a predictive model employing machine learning algorithms. The model's purpose is to discern individuals at higher risk of stroke by analyzing the attributes within our dataset.

Methods and Tools πŸ› οΈ

To fulfill project requirements, the following tools and technologies will be employed:

  • Python: Python and its various libraries will be used for data analysis, visualization, and model construction.
  • HTML Integration with Python (Flask): We will integrate HTML with Python using the Flask web framework. In this integration, HTML will serve as the front-end interface, while Flask will function as the back-end framework, enabling the creation of an interactive web application. Users can engage with our project through this interface.
  • LaTeX Integration: LaTeX will play a crucial role in creating a comprehensive and structured report to document our findings and project details.
  • Pyplot: Pyplot will be used for creating a wide array of data visualizations, including bar charts, line plots, scatter plots, and histograms, to effectively describe our insights and analysis results.
  • PostgreSQL (Optional): PostgreSQL will be employed for data storage and retrieval, particularly if the dataset size or database management complexity requires it.

Project Documentation πŸ“–

Thorough documentation, including code comments, explanations, dataset sources, data pre-processing details, and model evaluation results, will be carefully drafted.

Conclusion and Impact 🌟

The project's primary objective is to offer valuable key insights into stroke risk factors by finding underlying trends and patterns in the data. By performing data analysis, visualization, and predictive modeling, we can help healthcare professionals and policymakers use these insights to develop targeted prevention strategies, promote healthier lifestyles, and allocate resources more effectively to reduce the burden of strokes on society.


Files in this GitHub Repository

  • code.ipynb: Jupyter Notebook containing the project code.
  • environment.yml: Environment file for recreating the project's Python environment.
  • README.md: You're reading it right now! πŸ˜‰
  • requirements.txt: Python package requirements.
  • stroke.csv: The dataset used for analysis.
  • Setup steps.txt: Detailed terminal steps for running the project.
  • CS699 Roadmap: Additional information about the project's roadmap.
  • CS699 Proposal: From where it all began. 😊

Setup πŸ› οΈ

Follow these steps in your terminal to set up the environment:

  1. Navigate to the Project Folder:

    • Open your terminal and use the cd command to go to the folder where you cloned or downloaded this repository.
  2. Activate the Conda Environment:

    • Activate the Conda environment using the following command:
      conda activate lightgbm-env
  3. Launch Jupyter Notebook:

    • Start Jupyter Notebook by running:
      jupyter notebook
  4. Deactivate Conda Environment:

    • Once you're done working with the Jupyter (ipynb) notebook, deactivate the Conda environment using:
      conda deactivate

This setup will get you ready to work on your project with the required environment. Enjoy your data exploration and analysis!

Feel free to explore, contribute, and uncover the secrets of strokes with us! πŸ•΅οΈβ€β™€οΈπŸ”

License: This project is licensed under the MIT License.

About

πŸš€ CS 699 Project (2023) Setup IX: At the end of Readme.md

Topics

Resources

Stars

Watchers

Forks

Languages