Skip to content

This is the code that I used to generate predictions about the 2022 FIFA World Cup being hosted in Qatar later this year. It uses simple machine learning algorithms such as logistic regression and support vector machine (SVM) as well as provides clean visualizations.

Notifications You must be signed in to change notification settings

ritikgshah/2022FIFA-World-Cup-Predictor

Repository files navigation

2022FIFA-World-Cup-Predictor

Tools used: Python, Scikit-Learn, Tabulate, Pandas, NumPy, Jupyter, Web Scrapping, Data Wrangling

Project Description: High profile sport tournaments capture the eyes of millions of people who eagerly await their team to win. Since football is nothing short of a religion in most parts of the world, the highest level of international competition, the FIFA world cup, has the power to make nations as whole wonder which team will take the world cup home this time. I am one such person, wondering which team will go all the way in the next edition of the world cup which will be played in 2022. I modeled which team would win against another by having two classes that each team would take: home team and away team. I divided the teams playing the world cup into groups of 4 teams as drawn by FIFA, and within each group every team plays every other team once. In the group stages, I assigned 3 points to each team for a win, 1 point for a draw, and 0 points for a loss. The top two teams from each group with the highest points progress to the knockout stage. In the knockout stage whichever team wins their game progresses to the next round, while the losing team is eliminated. I simulated the results of each game by picking the winner as the team with the highest probability of winning according to the data which was the performance of each team playing the 2022 world cup in international matches from the year 2000 onwards as well as the latest FIFA men’s rankings. The model predicted England and Brazil reaching the finals, with Brazil being crowned champions. These predictions are consistent with predictions generated by a lot of reputed betting and sport analytic companies which all see Brazil taking the cup home.

2022 FIFA WC predictor.ipynb is a jupyter file containing all the python code used to generate the predictions. The elimination style tournament bracket created has visualization issues on github, but when this file and all supporting files are downloaded and run on your local machine, these issues will go away and the bracket will look much cleaner. The knockoutstage bracket looks like this when I run it on my computer:

Screen Shot 2022-07-12 at 11 17 01 PM

Sometimes when the program is run it will show that Iran will go to the knockout stages, while othertimes it will show that USA will go through since they are both predicted to score the same number of points in the same group. I do not have a tie breaker system in place, but this should not matter as either way Netherlands is predicted to defeat Iran or USA whoever the program says will go through to the knockouts.

Web Scraping_no run.ipynb is a jupyter file that has code for webscraping data that was used to pull different rankings from the web to be used in my predictions.

fifa-world-cup-2022-fixtures.csv has the fixture data for the 2022 world cup as set by FIFA.

new_rankings.csv has the latest FIFA men's world rankings that will be used to make predictions for the world cup.

results.csv has the results of all international matches played and this data was taken from Kaggle.

2022 FIFA world cup predictor paper.pdf has a detailed write up of my thought process as I approached this question. There is an introduction, methods, discussion, results, and conclusion section which is written to explain the models I use, why I use them, and show what I found by using these models.

Note: This project was coded on Google colab, hence you might see some code for that process.

About

This is the code that I used to generate predictions about the 2022 FIFA World Cup being hosted in Qatar later this year. It uses simple machine learning algorithms such as logistic regression and support vector machine (SVM) as well as provides clean visualizations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages