Skip to content

This is repo for Group 6's project for CFGDegree (Data stream) in Spring 2023.

Notifications You must be signed in to change notification settings

wenjia-knight/CFG-Data-2-Group-6-Project

Repository files navigation

image

CFG-Data-2-Group-6-Project

This repo contains datasets, Python scripts and Jupyter Notebooks used for Group 6 Project as part of the CFGDegree (Data stream) in Spring 2023.

The aim of this project is to examine the relationship between the mood of music streamed in the UK in comparison to the UK COVID lockdown timeline and NHS mental health services referrals.


Table of contents

Information

For a detailed report on the questions we are trying to answer and our approach to this analysis please see the Group 6 Project Report.pdf

Tools and Technologies Used

Python Pandas NumPy Matplotlib Plotly scikit-learn Pycharm Jupyter Notebook Colab MySQL Spotify GitHub Google Drive Markdown Slack Zoom Notion

Requirements

Please make sure you have installed the following Python packages before running the scripts and notebooks:

  • Pandas
  • NumPy
  • Seaborn
  • Matplotlib
  • Scikit-learn
  • Spotipy
  • xlrd
  • mysql-connector-python
  • beautifulsoup4
  • requests
  • plotly
  • joblib
  • tabulate

Files

├── Project Report 
  └── Group 6 Project Report.pdf
  └── Group 6 Project Presentation.pdf

├── Jupyter Notebooks 
  └── 01_data_preprocessing_notebooks
    └── 01_Spotify data mood allocation.ipynb
    └── 02_ML_spotify.ipynb
  └── 02_data_cleaning_notebooks
    └── 01_charts_and_moods_cleaning.ipynb
    └── 02_cleaning_NHS_data_attempts_1&3.ipynb
    └── 03_NHS_manual_data_cleaning_notebook.ipynb
  └── 03_data_analysis_and_visualisation_notebooks 
    └── 01_popular_tracks_data_analysis_final_Vilma.ipynb
    └── 02_Mood_Data_Visualisations_Isobel.ipynb
    └── 03_EDA.ipynb
    └── 04_Statistical_Exploration_MH_data.ipynb
  └── Images

├── Python code for NHS data cleaning
  └── NHS_data_cleaning_code_2.py

├── Python codes for web scraping & API
  └── 01_get_historical_charts.py
  └── 02_get_unique_isrc.py
  └── 03_get_spotify_ids.py
  └── 04_get_audio_features.py
  └── config.py

├── SQL database
  └── config.py
  └── create_database.py
  └── database_queries.ipynb
  └── EER diagram.png

├── Datasets

Spotify API

To connect to the spotify API using your own client ID and client SECRET you will need to create an account on Spotify Developer Sign-up
This page explains how to create your authorisation Spotify Authorisation

Then simply input your credentials when you see a CLIENT ID and CLIENT SECRET variable or in the config file if available.

mySQL database

To connect to mySQL database, please use your own root password and replace 'PASSWORD' in the config file in SQL database folder.

How to Run

To get started, please clone the repository to your local machine. Run through the Jupyter Notebooks saved in Jupyter Notebooks/03_data_analysis_and_visualisation_notebooks to see the analysis and visualisation of the data shown in the report. All data used in this project is saved in the Datasets folder.

Other folders in this repository are as follows (used to create the datasets):

  • Jupyter Notebooks/01_data_preprocessing_notebooks contains the notebooks used to classify music moods using Machine Learning.
  • Jupyter Notebooks/02_data_cleaning_notebooks contains the notebooks used to clean the NHS data and popular track moods data.
  • Python code for NHS data cleaning contains the Python scripts used to clean the NHS data.
  • Python codes for web scraping & API contains the Python scripts used to scrape the data from the Official Charts website and get audio features from the Spotify API.
  • SQL database contains the Python script used to create the database and Juypter Notebook used to generate some example queries. Please note SQL database was not used in this project. It was created to show the process of creating a database and how to query the data.

Project status

This project is now completed. However, to further build on the amazing work that has already been done, here are some ideas to expand this project in future:

  • Increase our data sizes, more data points would help establish findings more conclusively.
  • Include more mood classes, four mood classes is not really enough detail to describe all music tracks.

Authors and Acknowledgment

Autumn
Ami
Isobel
Isha
Vilma
Wenjia

About

This is repo for Group 6's project for CFGDegree (Data stream) in Spring 2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published