This repo contains datasets, Python scripts and Jupyter Notebooks used for Group 6 Project as part of the CFGDegree (Data stream) in Spring 2023.
The aim of this project is to examine the relationship between the mood of music streamed in the UK in comparison to the UK COVID lockdown timeline and NHS mental health services referrals.
- Information
- Tools and Technologies Used
- Requirements
- Files
- Spotify API
- mySQL database
- How to Run
- Project status
- Authors and Acknowledgment
For a detailed report on the questions we are trying to answer and our approach to this analysis please see the Group 6 Project Report.pdf
Please make sure you have installed the following Python packages before running the scripts and notebooks:
- Pandas
- NumPy
- Seaborn
- Matplotlib
- Scikit-learn
- Spotipy
- xlrd
- mysql-connector-python
- beautifulsoup4
- requests
- plotly
- joblib
- tabulate
├── Project Report
└── Group 6 Project Report.pdf
└── Group 6 Project Presentation.pdf
├── Jupyter Notebooks
└── 01_data_preprocessing_notebooks
└── 01_Spotify data mood allocation.ipynb
└── 02_ML_spotify.ipynb
└── 02_data_cleaning_notebooks
└── 01_charts_and_moods_cleaning.ipynb
└── 02_cleaning_NHS_data_attempts_1&3.ipynb
└── 03_NHS_manual_data_cleaning_notebook.ipynb
└── 03_data_analysis_and_visualisation_notebooks
└── 01_popular_tracks_data_analysis_final_Vilma.ipynb
└── 02_Mood_Data_Visualisations_Isobel.ipynb
└── 03_EDA.ipynb
└── 04_Statistical_Exploration_MH_data.ipynb
└── Images
├── Python code for NHS data cleaning
└── NHS_data_cleaning_code_2.py
├── Python codes for web scraping & API
└── 01_get_historical_charts.py
└── 02_get_unique_isrc.py
└── 03_get_spotify_ids.py
└── 04_get_audio_features.py
└── config.py
├── SQL database
└── config.py
└── create_database.py
└── database_queries.ipynb
└── EER diagram.png
├── Datasets
To connect to the spotify API using your own client ID and client SECRET
you will need to create an account on Spotify Developer Sign-up
This page explains how to create your authorisation Spotify Authorisation
Then simply input your credentials when you see a CLIENT ID and CLIENT SECRET variable or in the config file if available.
To connect to mySQL database, please use your own root password and replace 'PASSWORD' in the config file in SQL database
folder.
To get started, please clone the repository to your local machine. Run through the Jupyter Notebooks saved in Jupyter Notebooks/03_data_analysis_and_visualisation_notebooks
to see the analysis and visualisation of the data shown in the report.
All data used in this project is saved in the Datasets
folder.
Other folders in this repository are as follows (used to create the datasets):
Jupyter Notebooks/01_data_preprocessing_notebooks
contains the notebooks used to classify music moods using Machine Learning.Jupyter Notebooks/02_data_cleaning_notebooks
contains the notebooks used to clean the NHS data and popular track moods data.Python code for NHS data cleaning
contains the Python scripts used to clean the NHS data.Python codes for web scraping & API
contains the Python scripts used to scrape the data from the Official Charts website and get audio features from the Spotify API.SQL database
contains the Python script used to create the database and Juypter Notebook used to generate some example queries. Please note SQL database was not used in this project. It was created to show the process of creating a database and how to query the data.
This project is now completed. However, to further build on the amazing work that has already been done, here are some ideas to expand this project in future:
- Increase our data sizes, more data points would help establish findings more conclusively.
- Include more mood classes, four mood classes is not really enough detail to describe all music tracks.