Skip to content

Latest commit

 

History

History
75 lines (60 loc) · 3.64 KB

readme.md

File metadata and controls

75 lines (60 loc) · 3.64 KB

NHL Game Classification - Game Win/Loss for 2023/24 Season

This project aims to classify NHL games as either a win or a loss by team, for the 2023/24 season.

Introduction

In this project, we will analyze historical NHL game data and build machine learning models (Classification) to predict the outcome of games for the 2023/24 season. The models will classify each game as either a win or a loss based on various features.

  • Data is obtained using the NHL API

Run Locally

1.) Create and Activate the Virtual Environment

  • Open a terminal and navigate to the points directory:
    cd .\examples\teams\
  • Create the virtual environment
    python -m venv env or python3 -m venv env
  • Activate the virtual environment
    cd .\env\Scripts\activate
  • You should see a (env) in yout terminal

2.) Install the Packages

  • Navigate back to the root directory
        cd ..
        cd ..
  • Install the pip packages from the requirements.txt
    pip install -r requirements.txt
    
    

Steps Taken

  1. Created and Ran init.py which:

  2. Created and ran model.py, which creates three Classification models:

    • Random Forest (RFC)
    • Support Vector Machines (SVM/SVC)
    • Multilayer Perceptron Classifier (MLP)

    For each model:

    • Imported the NHL_teams_historical_stats_20132014_to_20222023.csv
    • Dropped columns that are not needed and those that cause data leakage.
    • Updated PHX to ARI as they have changed their team name (special case).
    • Created a new feature column for days since last game for each team.
    • Created a new feature column for the current win streak for each team.
    • Created a new feature column for the games played for each team.
    • Saved the cleaned data to a csv: NHL_teams_historical_stats_20132014_to_20222023.csv_cleaned.csv
    • One-hot encoded 'homeRoad', 'opponentTeamAbbrev', 'teamFullName'
    • Fill in missing or N/A values with 0.
    • Split the data into train and test sets.
    • Created the 3 classifier models (RFC, SVM, MLP).
    • Ran the test set, which prints model metrics to the console.
    • Saved the prediction model and normalization values for each model to a .joblib file in the respective output folder (allows it to be used in the future without retraining).
  3. Created and ran test.py which:

    • Tests the model against:
      • current season completed games (up to: feb 26, 2024).
      • remaining games for the current season (feb 27 - Apr 18).
  4. Results are output to csv/test