This is an analysis of the data on Spotify tracks from 1921-2020 with Jupyter Notebook and Python Data Science tools.
The Spotify dataset (titled data.csv) consists of 160,000+ tracks from 1921-2020 found in Spotify as of June 2020. Collected by Kaggle user and Turkish Data Scientist Yamaç Eren Ay, the data was retrieved and tabulated from the Spotify Web API. Each row in the dataset corresponds to a track, with variables such as the title, artist, and year located in their respective columns. Aside from the fundamental variables, musical elements of each track, such as the tempo, danceability, and key, were likewise extracted; the algorithm for these values were generated by Spotify based on a range of technical parameters.
- Studying the correlations between the variables in the Spotify data.
- The evolution of different musical elements through the years.
- The divide between explicit and non-explicit songs through the years.
- Determining if there is a significant difference in popularity between explicit and non-explicit songs.
- Finding the most frequent emotions in Spotify tracks and analyzing their musical elements based on the track's mode and key.
- Determining the classifications of the Spotify tracks through K-Means Clustering.
Spotify Data.ipynb
is the main notebook where the data is imported for EDA and FII.data.csv
is the dataset downloaded from Kaggle.spotify_eda.html
is the HTML file for the comprehensive EDA done using the Pandas Profiling module.
- This is in partial fulfillment of the course Statistical Modelling and Simulation (CSMODEL).