Skip to content

Machine learning project aimed at predicting new COVID-19 cases using historical COVID-19 and mobility data. The project involves data fetching, migration, preprocessing, exploratory data analysis (EDA), feature engineering, data splitting, model training, and evaluation.

License

Notifications You must be signed in to change notification settings

datpham0412/covid19-prediction-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

63 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🦠 Covid 19 Prediction Model

License: MIT GitHub issues GitHub stars GitHub forks

πŸ“‹ Project Description

The Covid 19 Prediction Model is a comprehensive tool designed to predict the spread and impact of Covid-19 using historical data and advanced statistical techniques. The model leverages multiple data sources, including Covid-19 case data and mobility data, to provide accurate forecasts and insights into the pandemic's trends. The project aims to assist policymakers, healthcare professionals, and the general public in understanding and responding to the ongoing Covid-19 crisis.

πŸ›  Technologies Used

python cplusplus sqlite pandas scikit-learn matplotlib seaborn cmake google-test dill jupyter-notebook

  • Python: Core programming language for data processing and model training.
  • C++: For efficient data processing and handling large datasets.
  • SQLite: Database management for storing and querying data.
  • Pandas: Data manipulation and analysis.
  • Scikit-learn: Machine learning library for building predictive models.
  • Matplotlib & Seaborn: Data visualization.
  • CMake: Cross-platform build system.
  • Google Test: Unit testing framework for C++.
  • Dill: For model serialization in Python.
  • Jupyter Notebook: For interactive data analysis and visualization.

πŸ“š Features

  • Fetch and preprocess Covid-19 and mobility data from multiple sources.
  • Integrate and clean data, ensuring consistency and accuracy.
  • Create various date-based, lag, and rolling average features to enhance model performance.
  • Train and evaluate machine learning models to predict new Covid-19 cases.
  • Visualize actual vs. predicted cases, residuals, and other key metrics to interpret model performance.
  • Generate detailed reports and visualizations for data exploration and model results.
  • Support for user-defined country data extraction and analysis.

πŸš€ Installation and Running the Project

Prerequisites

  • Ensure you have git installed for cloning repositories.
  • Ensure you have CMake installed and added to your system's PATH.

Steps

  1. Clone the Repository:

    git clone https://github.com/yourusername/Covid19_Prediction_Model.git
    cd Covid19_Prediction_Model
  2. Install CMake:

    • Download CMake from here
    • Add the CMake binary path (e.g., C:\Program Files\CMake\bin) to your environment variables.
  3. Clone SQLiteCpp:

    cd external
    git clone https://github.com/SRombauts/SQLiteCpp.git
  4. Modify SQLiteCpp CMakeLists.txt:

    • Open CMakeLists.txt in the external/SQLiteCpp folder.
    • Change line 388 from:
      option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." ON)
      to:
      option(SQLITECPP_RUN_CPPLINT "Run cpplint.py tool for Google C++ StyleGuide." OFF)
  5. Build the Project:

    cd ..
    mkdir build
    cd build
    cmake ..
    cmake --build . --config Release
  6. Run the Application:

    cd Release
    Covid19_Prediction.exe

Python Dependencies

Install the required Python libraries:

pip install pandas numpy scikit-learn sqlite3 matplotlib seaborn dill joblib notebook

Running the scripts

  1. Fetch Data
python scripts/fetch_data.py

This script fetches COVID-19 and mobility data. Note that this may take up to 10-20 minutes.

  1. Migrate Data
python scripts/migrate_data.py

This script migrates COVID-19 and mobility data for a specified country from the raw datasets to processed CSV files.

  1. Build the project
cd ..
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd Release
Covid19_Prediction.exe

Follow these steps to configure, build, and run the C++ project.

  1. Process Data
python scripts/data_processing.py

This script processes the COVID-19 and mobility data for a specific country provided by the user.

  1. Perform EDA
python scripts/eda_visualization.py

This script performs Exploratory Data Analysis on the processed data.

  1. Feature Engineering
python scripts/feature_engineering.py

This script performs feature engineering on the processed data.

  1. Split Data
python scripts/split_data.py

This script splits the data into training and testing sets.

  1. Model Training
python scripts/model_training.py

This script trains the machine learning model.

  1. Model Evaluation
python scripts/model_evaluation.py

This script evaluates the performance of the trained model.

  1. Interpret Predictions
cd notebooks
jupyter notebook

Open interpret_predictions.ipynb in Jupyter Notebook to visualize and interpret the model's predictions.

πŸ“· Screenshots

CorrelationHeatMatrix JupyterNotebook2 NewCases_NewDeathsOverTime CorrelationScatterPlot DistributionNewCases JupyterNotebook1 JupyterNotebook2 JupyterNotebook3 JupyterNotebook4

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE) file for details.

πŸ“ž Contact

For any inquiries, please contact [email protected].

Made with ❀️ by Dat Pham

About

Machine learning project aimed at predicting new COVID-19 cases using historical COVID-19 and mobility data. The project involves data fetching, migration, preprocessing, exploratory data analysis (EDA), feature engineering, data splitting, model training, and evaluation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages