This project experiments with creating a cricket simulator using maching learning.
Take a look at https://medium.com/@tejalnarkar/cricket-simulation-engine-using-machine-learning-a2758933b0a7 for the details.
In order to train the random classifier model, we needed match data at each ball level. Additionally we needed player statistics, which we calculated using the match data. All this data was found on the site: https://cricsheet.org/
Disclaimer: This is a proof of concept so please excuse the lack of comments and the hacky code. Contributions / improvements welcome.
All the data files are in folder "data". Each match is an individual files in json format.All the data processing and feature prep scripts are in the folder data_prep.
The odi_data_explore.py file processes the data in the data folder and generates a CSV file with all ball-by-ball data.
The odi_data_explore.py also generates the player statistics file which calculates the batting and bowling averages and strike rates for all players.
The team_prep.py generates the player lineups for each country/team.
The feature_prep.py file creates the features that need to feed into the model
The models folder has model training script.
The random forest classifier is in the engine training file.
The match_simulator.py has logic for using the stored model to simulate the match. It also has the logic for the match and maintains the state of the match.
python data_prep/odi_data_prep.py
python data_prep/feature_prep.py
python data_prep/team_prep.py
python models/engine_training.py
python match_simulator.py
India won by 111 runs