Skip to content

I aim to automate playlist creation for Moosic, a startup known for manual curation, using Machine Learning, while addressing skepticism about the ability of audio features to capture playlist "mood."

Notifications You must be signed in to change notification settings

Cintia0528/Data_Science-Unsupervised_Machine_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Unsupervised Machine Learning

Goal

To evaluate whether Machine Learning can be used to automatise playlist creation.

Overview

Moosic is a small startup that creates playlists curated manually by music experts. Their listeners love the personal touch, which they achieve by capturing the "mood" or "vibe".

Board: Believes that they need at least a degree of automatisation, as music experts are not able to keep up with the demand. Currently the whole creation process is done manually.

Music Experts: Are skeptical that audio features on their own are not enough to capture the "mood" which is very subjective that only a human can judge.

Context

Moosic wants the data science team to use a dataset that has been collected from the Spotify API and contains the audio features (tempo, energy, danceability…) for a few thousand songs. After useing a basic clustering algorithm such as K-Means to divide the dataset into a few clusters the data team shall answer the following two questions:

  1. Are Spotify’s audio features able to identify “similar songs”, as defined by humanly detectable criteria?
  2. Is K-Means a good method to create playlists?

Task:

  • Import list of 5000 songs collected from Spotify API
  • Use basic clustering ex.: K-Means to divide dataset into clusters
  • Validate clusters, export clusters (playlists) to Spotify and listen to some of the songs

Challenges:

  • Difficult to evaluate the results without listening to each playlist
  • No tangible way to measure accuracy
  • Unevenly large clusters
  • Subjective - what is a good playlist?

Solutions:

  • Must be visualized, so we can see the overlaps and the outliers
  • Limit the number of features to 3 (or multiples of 3) so it can be visualized in 3D scatterplot
  • Find a balance between K-score and the business objectives
  • Instead of replacing music experts, ML does the "heavy lifting" and they fine-tune the results

Approach

  1. Evaluate the database; basic cleaning, ex.: missing, corrupted values, correct data types
  2. Exploration of audio features
  3. Decide which features to drop, and which features to use
  4. K-Means clustering
  5. Evaluation of clusters
  6. Sub-clustering
  7. Evaluation of final clusters

Deliverables

5 minute PowerPoint presentation found here to the Board of Directors, that summarizes the findings and suggests a course of action. Python code is found here.

Skills & Tools

  1. Data Cleaning & Quality Assurance
  2. Data Preprocessing: Scaling
  3. K-Means Clustering
  4. Elbow Method and Silhouette Score
  5. Data Visualization (3D Scatterplot)

About

I aim to automate playlist creation for Moosic, a startup known for manual curation, using Machine Learning, while addressing skepticism about the ability of audio features to capture playlist "mood."

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages