An independently researched and implemented machine learning project in the Master of Computer Science program at Wilfrid Laurier University.
I took historical Major League Baseball data from a Kaggle dataset (as determined by the project requirements) and ran some experiments in an attempt to predict future batting success.
Kaggle dataset source: https://www.kaggle.com/datasets/darinhawley/mlb-batting-stats-by-game-19012021 (external link)
The project, as submitted, was too large to be included on GitHub. As a result, the project .zip file (which includes the data files described in the Jupyter notebooks on GitHub) is available for download here (external link).
The complete set of project deliverables includes:
- a proposal document
- the Jupyter notebook(s) and source data (external link)
- a final report document, and
- a video presentation (external link)
======================
-
Local Jupyter Notebook server is installed and running
-
harr2890_project.zip
is extracted locally, with structure intact, where notebooks can be run -
Can create/write to
data
subfolder within theharr2890_project
folder
Please run the Notebook series in sequential, step order.
Step 1 (harr2890_project_step1_data_prep)
- General data preprocessing; must be run before all other steps
Step 2 (harr2890_project_step2_hof_data_prep)
- Hall of Fame Approach preprocessing; must be run before Step 3
Step 3 (harr2890_project_step3_hof_modelling)
- Hall of Fame Approach modelling (selection and evaluation)
Step 4 (harr2890_project_step4_ops_data_prep)
- OPS Approach preprocessing; must be run before Step 5
Step 5 (harr2890_project_step5_ops_modelling)
- OPS Approach modelling (selection and evaluation)