CulturalAnalytics-CoverPredictions

What to expect in this repo

This repo contains the code for my term paper in the module Cultural Analytics of the MSc Digital Humanities at Leipzig University. Here I will explore classifier training using album covers or alternatively text descriptions from BLIP for the classification of genres and subgenres in music. For my analysis I use album covers crawled from MusicBrainz along with their meta data on artists, releases, genres and subgenres. The dataset is currently being crawled and will feature over 1 million album covers for over 200 genres with more than 900 subgenres.

In the related repo (linked above) I pursue a similar project on musical metadata of the album cover data set. The results of both project are aimed to be comparable, giving insight on the same research question from two different perspectives.

Working title

Genre-Defining Features in Album Cover Art: Investigating Common Visual Motifs Across Musical Subgenres Using BLIP-2 Captions and Machine Learning Classifiers

Outline and research questions

In my research paper, I aim to explore the classification of musical subgenres through their album covers using machine learning algorithms. Music genres typically encompass various subgenres, each possessing unique yet subtly connected features that tie them to their overarching genre. However, these connecting features are often nuanced and challenging to pinpoint. My study will investigate whether machine learning algorithms can detect statistical patterns in album cover designs, both within individual subgenres and across their broader genre categories. A key method of analysis will be examining the confusion matrix from the classification results. I will argue that a significant number of true positives in the matrix may indicate a statistical relationship within a subgenre. More importantly, the rate of false positives, especially between subgenres of the same genre, could reveal genre-spanning features. For example, I anticipate a higher rate of false positives within subgenres of Metal compared to false positives between a Metal subgenre and a Hip Hop subgenre. This pattern, if observed, could suggest the presence of distinct, genre-specific characteristics in album cover designs.

Possible challenges

album covers are extremely diverse and artistic; lots of noise in the data is to be expected
data is not evenly distributed across genres and subgenres and careful sampling is needed; maybe sacrifice diversity in favor of consistency and only use the 10 most common genres with their 10 most common subgenres each?
rate of false positives might not necessarily be an indicator for features connecting subgenres to a genre; there could be a bias in distribution of other factors between genres like release date or geographical origin that can't be prevented even through careful sampling

TODO:

Data collection and preparation

Processing and analysis

study best practice of preprocessing of visual data
train classifiers
evaluate
repeat
???
profit
perform clustering analysis like k-means (maybe some subgenre might fit more in a different parent genre)
visualise distance between subgenres or clustering of classes
possible approaches:
- multidimensional scaling
- t-distributed stochastic neighbor embedding
- network graphs using Three.js (or some other fancy interactive 3D visualization framework)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
blur_text.py		blur_text.py
caption_images.ipynb		caption_images.ipynb
desc_stats.py		desc_stats.py
hf_playground.py		hf_playground.py
index.html		index.html
interactive_network.py		interactive_network.py
map_subgenres.py		map_subgenres.py
model_benchmark.py		model_benchmark.py
model_training.py		model_training.py
plot_3d_graph.py		plot_3d_graph.py
query_mb_db.py		query_mb_db.py
requirements.txt		requirements.txt
sample_dataset.py		sample_dataset.py
scrape_mb.py		scrape_mb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CulturalAnalytics-CoverPredictions

What to expect in this repo

Working title

Outline and research questions

Possible challenges

TODO:

Data collection and preparation

Processing and analysis

About

Releases

Packages

Languages

nicobenz/CulturalAnalytics-CoverPredictions

Folders and files

Latest commit

History

Repository files navigation

CulturalAnalytics-CoverPredictions

What to expect in this repo

Working title

Outline and research questions

Possible challenges

TODO:

Data collection and preparation

Processing and analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages