This soccer database comes from Kaggle and is well suited for data analysis and machine learning. It contains data for soccer matches, players, and teams from several European countries from 2008 to 2016. This dataset is quite extensive, more information could be found here.
(Image is from a copyright-free website: https://www.pexels.com/royalty-free-images/.)
- The database is stored in a SQLite database. We can access database files using software like DB Browser;
- This dataset will help practicing with SQL joins. Make sure to look at how the different tables relate to each other;
- Some column titles should be self-explanatory, and others we’ll have to look up on Kaggle.
Table of Contents |
---|
Prerequisites 🔍📜 |
Design 📐 |
Conclusions 📌 |
License 🔖 |
- Python 3.6.3
- Jupyter Notebook
- Anaconda-Navigator
- SQLite database
- DB Browser for SQLite
Step One - Choose Data Set
Click this link to download the corresponding data.
Step Two - Get Organized
This project eventually contain:
- The report communicating any findings;
- Any Python code used during the analysis;
- The data set;
Step Three - Analyze
Brainstorm some questions that could be answered using the data set, then start answering those questions, we would mainly focus on looking at the relationships between multiple variables.
In current study, a good amount of profound analysis has been carried out. Prior to each step, deailed instructions was given and interpretions was also provided afterwards. The two dataset included 115347 and 183978 pieces of european soccer match information ranging from 2008 to 2016, respectively. Based on such substantial data, the analysis would be more reliable as opposed to small scale analysis. The limitations of current study were original data from website hadn't been organized well, as many tables were connected via foreign to foreign key relation. More important, there was no key paired for match and player information. As such, profound analysis was inadmissible, such as player attributes's impact on match.