Welcome to this tutorial on vector databases and music recommendation systems using Python and Qdrant. Here, we will learn about how to get started with audio data, embeddings and vector databases.
By the end of this tutorial, you will have a good understanding of how to use vector databases and Python to create your own music recommendation engine.
The dataset we will be using is called,
Ludwig Music Dataset (Moods and Subgenres)
and it can be found on Kaggle. It was collected for the purpose of music information retrieval (MIR) by
Discogs and AcousticBrainZ, and it contains over 10,000 songs of
different genres and subgenres. Bare in mind that the full dataset is 12GB in size so we recommend that
you download your favorite genre from the mp3
directory, and the labels.json
file. That will be more
than enough to follow along for the rest of the tutorial.
Once you download the full dataset, you should see the following directories and files.
../data/ludwig_music_data
├── labels.json
├── mfccs
│ ├── blues
│ ├── ...
│ └── rock
├── mp3
│ ├── blues
│ ├── ...
│ └── rock
├── spectogram
│ └── spectogram
└── subgeneres.json
The labels.json
contain all the metadata (e.g. artist, subgenre, album, etc.) associated with each song.
The Spectograms
directory contains spectograms, which are visual representation of the frequencies present
in an audio signal over time. It is a 2D graph where the x-axis represents time and the y-axis represents
frequency. The intensity of the color or brightness of the graph indicates the strength or amplitude of the
frequencies at a particular time. Here is an example of a Spectogram.
If you've ever wonder what audio data looks like visually, this is one way to visualize it.
Let's get our environment set up before we prepare the data.
Before you run any line of code, please make sure you have
- downloaded the data
- created a virtual environment (if not in Google Colab)
- installed the packages below
- started a container with Qdrant
# with conda or mamba if you have it installed
mamba env create -n my_env python=3.10
mamba activate my_env
# or with virtualenv
python -m venv venv
source venv/bin/activate
# install packages
pip install qdrant-client transformers datasets pandas numpy torch librosa tensorflow openl3 panns-inference pedalboard streamlit
The open source version of Qdrant is available as a docker image and it can be pulled and run from any machine with docker installed. If you don't have Docker installed in your PC you can follow the instructions available in the official documentation here. After that, open your terminal and start by downloading the image with the following command.
docker pull qdrant/qdrant
Next, initialize Qdrant with the following command and you should be good to go.
docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
Verify that you are ready to go by importing the following libraries and connecting to Qdrant via its Python client.
from transformers import AutoFeatureExtractor, AutoModel
from IPython.display import Audio as player
from datasets import load_dataset, Audio
from panns_inference import AudioTagging
from qdrant_client import QdrantClient
from qdrant_client.http import models
from os.path import join
from glob import glob
import pandas as pd
import numpy as np
import librosa
import openl3
import torch
client = QdrantClient(host="localhost", port=6333)
We will also go ahead and create the collection we will be working with in this tutorial. The dimensions will be of size 2048 and we'll set the distance metric to cosine similarity.
my_collection = "music_collection"
client.recreate_collection(
collection_name=my_collection,
vectors_config=models.VectorParams(size=2048, distance=models.Distance.COSINE)
)
True
We will be using Huggin Face's datasets library to read in our data and massage it a bit.
data_path = join("..", "data", "ludwig_music_data")
data_path
'../data/ludwig_music_data'
Feel free to change the genre to the one you like the best.
music_data = load_dataset(
"audiofolder", data_dir=join(data_path, "mp3", "latin"), split="train", drop_labels=True
)
music_data
Dataset({
features: ['audio'],
num_rows: 979
})
music_data[115]
{'audio': {'path': '/home/ramonperez/Tresors/qdrant_org/content/examples/data/ludwig_music_data/mp3/latin/0rXvhxGisD2djBmNkrv5Gt.mp3',
'array': array([ 0.00000000e+00, 1.24776700e-09, -4.54397187e-10, ...,
-7.98814446e-02, -8.84955898e-02, -1.05223551e-01]),
'sampling_rate': 44100}}
As you can see, we got back json objects with an array representing our songs, the path to where each one of them is located in our PC, and the sampling rate for each. Let's play the song at index 115 and see what it sounds like.
player(music_data[115]['audio']['array'], rate=44100)
We'll need to extract the name of each mp3 file as this is the unique identifier we'll use in order to get the corresponding metadata for each song. While we are at it, we will also create a range of numbers and add it as the index to the dataset.
ids = [
(
music_data[i] # for every sample
['audio'] # in this directory
['path'] # extract the path
.split("/") # split it by /
[-1] # take only the last piece "id.mp3"
.replace(".mp3", '') # and replace the .mp3 with nothing
)
for i in range(len(music_data))
]
index = [num for num in range(len(music_data))]
ids[:4]
['0010BnyFuw94XFautS2uJp',
'00RhgYVH6DrHl0SuZWDp8W',
'01k69xxIQGL94F8IfIkI5l',
'02GUIyXZ9RNusgUocEQIzN']
music_data = music_data.add_column("index", index)
music_data = music_data.add_column("ids", ids)
music_data[-1]
{'audio': {'path': '/home/ramonperez/Tresors/qdrant_org/content/examples/data/ludwig_music_data/mp3/latin/7yX4WgUfoPpMKZHgqpaZ0x.mp3',
'array': array([ 0.00000000e+00, -1.40022882e-09, -4.44221415e-09, ...,
-9.52053051e-02, -8.90597273e-02, -8.10846481e-02]),
'sampling_rate': 44100},
'index': 978,
'ids': '7yX4WgUfoPpMKZHgqpaZ0x'}
The metadata we will use for our payload lives in the labes.json
file, so let's extract it.
label_path = join(data_path, "labels.json")
labels = pd.read_json(label_path)
labels.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
tracks | |
---|---|
000QWvZpHrBIVrW4dGbaVI | {'otherSubgenres': {'L': [{'S': 'electronic---... |
0010BnyFuw94XFautS2uJp | {'otherSubgenres': {'L': [{'S': ' world'}, {'S... |
0055LRFB7zfdCXDGodyIz3 | {'otherSubgenres': {'L': []}, 'artist': {'S': ... |
005Dlt8Xaz3DkaXiRJgdiS | {'otherSubgenres': {'L': [{'S': 'rock'}, {'S':... |
006RpKEKItNO4q8TkAUpOv | {'otherSubgenres': {'L': [{'S': 'classical---c... |
As you can see, the dictionaries above contain a lot of useful information. Let's create a function to extract the data we want retrieve for our out recommendation system.
def get_metadata(x):
cols = ['artist', 'genre', 'name', 'subgenres']
list_of_cols = []
for col in cols:
try:
mdata = list(x[col].values())[0]
except:
mdata = "Unknown"
list_of_cols.append(mdata)
return pd.Series(list_of_cols, index=cols)
clean_labels = labels['tracks'].apply(get_metadata).reset_index()
clean_labels.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
index | artist | genre | name | subgenres | |
---|---|---|---|---|---|
0 | 000QWvZpHrBIVrW4dGbaVI | 047 | electronic | General Error | [{'S': 'electronic---synth-pop'}] |
1 | 0010BnyFuw94XFautS2uJp | Jimmy Buffett | latin | La Vie Dansante | [{'S': 'latin---cubano'}] |
2 | 0055LRFB7zfdCXDGodyIz3 | New Order | rock | Doubts Even Here | [{'S': 'rock---new wave'}] |
3 | 005Dlt8Xaz3DkaXiRJgdiS | Ricardo Arjona | rock | Historia de Taxi | [{'S': 'rock---pop rock'}] |
4 | 006RpKEKItNO4q8TkAUpOv | Worrytrain | electronic | They Will Make My Passage Easy | [{'S': 'electronic---ambient'}] |
The last piece of the puzzle is to clean the subgenres a bit, and to extract the path to each of the files since we will need them to load the recommendations in our app later on.
def get_vals(genres):
genre_list = []
for dicts in genres:
if type(dicts) != str:
for _, val in dicts.items():
genre_list.append(val)
return genre_list
clean_labels['subgenres'] = clean_labels.subgenres.apply(get_vals)
clean_labels['subgenres'].head()
0 [electronic---synth-pop]
1 [latin---cubano]
2 [rock---new wave]
3 [rock---pop rock]
4 [electronic---ambient]
Name: subgenres, dtype: object
file_path = join(data_path, "mp3", "latin", "*.mp3")
files = glob(file_path)
ids = [i.split('/')[-1].replace(".mp3", '') for i in files]
music_paths = pd.DataFrame(zip(ids, files), columns=["ids", 'urls'])
music_paths.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
ids | urls | |
---|---|---|
0 | 2PaETSKl3w3IdtLIbDnQXJ | ../data/ludwig_music_data/mp3/latin/2PaETSKl3w... |
1 | 3Cu37dl54yhg2ZPrEnTx0O | ../data/ludwig_music_data/mp3/latin/3Cu37dl54y... |
2 | 4RTRzqkcvvkvuMK5IpFLmS | ../data/ludwig_music_data/mp3/latin/4RTRzqkcvv... |
3 | 5A32KQZznC2HSqr9qzTl2N | ../data/ludwig_music_data/mp3/latin/5A32KQZznC... |
4 | 2uPQvR5WBOI22Wj2gwwiT5 | ../data/ludwig_music_data/mp3/latin/2uPQvR5WBO... |
We'll combine all files with metadata into one dataframe and then format it as a list of JSON objects for our payload.
metadata = (music_data.select_columns(['index', 'ids'])
.to_pandas()
.merge(right=clean_labels, how="left", left_on='ids', right_on='index')
.merge(right=music_paths, how="left", left_on='ids', right_on='ids')
.drop("index_y", axis=1)
.rename({"index_x": "index"}, axis=1)
)
metadata.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
index | ids | artist | genre | name | subgenres | urls | |
---|---|---|---|---|---|---|---|
0 | 0 | 0010BnyFuw94XFautS2uJp | Jimmy Buffett | latin | La Vie Dansante | [latin---cubano] | ../data/ludwig_music_data/mp3/latin/0010BnyFuw... |
1 | 1 | 00RhgYVH6DrHl0SuZWDp8W | Jimmy Buffett | latin | Brown Eyed Girl | [latin---cubano] | ../data/ludwig_music_data/mp3/latin/00RhgYVH6D... |
2 | 2 | 01k69xxIQGL94F8IfIkI5l | Los Delinqüentes | latin | Fumata Del Ladrillo | [latin---flamenco, rock---punk] | ../data/ludwig_music_data/mp3/latin/01k69xxIQG... |
3 | 3 | 02GUIyXZ9RNusgUocEQIzN | La Bottine Souriante | latin | Ma Paillasse | [latin---salsa] | ../data/ludwig_music_data/mp3/latin/02GUIyXZ9R... |
4 | 4 | 02IFfsWwxek6h9qLEH4sRA | Gipsy Kings | latin | Estrellas | [latin---flamenco] | ../data/ludwig_music_data/mp3/latin/02IFfsWwxe... |
payload = metadata.drop(['index', 'ids'], axis=1).to_dict(orient="records")
payload[:3]
[{'artist': 'Jimmy Buffett',
'genre': 'latin',
'name': 'La Vie Dansante',
'subgenres': ['latin---cubano'],
'urls': '../data/ludwig_music_data/mp3/latin/0010BnyFuw94XFautS2uJp.mp3'},
{'artist': 'Jimmy Buffett',
'genre': 'latin',
'name': 'Brown Eyed Girl',
'subgenres': ['latin---cubano'],
'urls': '../data/ludwig_music_data/mp3/latin/00RhgYVH6DrHl0SuZWDp8W.mp3'},
{'artist': 'Los Delinqüentes',
'genre': 'latin',
'name': 'Fumata Del Ladrillo',
'subgenres': ['latin---flamenco', 'rock---punk'],
'urls': '../data/ludwig_music_data/mp3/latin/01k69xxIQGL94F8IfIkI5l.mp3'}]
Audio embeddings are low dimensional vector representations of audio signals and they capture important features such as the pitch, timbre, and spatial characteristics of sound. These embeddings can be used as compact and meaningful representations of audio signals for various downstream audio processing tasks such as speech recognition, speaker recognition, music genre classification, and event detection. These embeddings are generally obtained using deep neural networks that take in an audio signal as input, and output a learned low-dimensional feature representation for that audio. In addition, these embeddings can also be used as input to further machine learning models.
There are different ways in which we can get started creating embeddings for our songs:
- by training a deep neural network from scratch on our dataset and extracting the embedding layer,
- by using a pre-trained model and the transformers Python library, or
- by using purpose-built libraries like openl3 and pann_inference.
There are other ways, of course, but here we'll use 2 and 3, the transformers architecture, and openl3 and pann_inference libraries.
Important INFO: While there are three approached showcased here, you only need to pick one to
continue with the tutorial. Here, we will follow along using the output from panns_inference
.
Let's get started.
OpenL3 is an open-source Python library for computing deep audio and image embeddings. It was created to provide an easy-to-use framework for extracting embeddings from audio and image data using pre-trained deep neural network models. The library includes pre-trained audio models like VGGish, YAMNet, and SoundNet, as well as pre-trained image models like ResNet and Inception. These models can be used for a variety of audio and image processing tasks, such as speech recognition, music genre classification, and object detection. Overall, OpenL3 is designed to make it easier for researchers and developers to incorporate deep learning models into their audio and image processing workflows.
Let's read in an audio file and extract the embedding layer with openl3.
one_song = join(data_path, "mp3", "latin", "0rXvhxGisD2djBmNkrv5Gt.mp3")
audio, sr = librosa.core.load(one_song, sr=44100, mono=True)
audio.shape
(1322496,)
player(audio, rate=sr)
open_emb, ts = openl3.get_audio_embedding(audio, sr, input_repr="mel128", frontend='librosa')
The model returns an embedding vector for each timestamp and a timestamp vector. This means that to get a one dimensional embedding for the whole song, we'll need to get the mean of this vectors.
open_emb.shape, open_emb.mean(axis=0).shape, open_emb.mean(axis=0)[:20]
You can generate your embedding layer for the whole dataset with the following function. Note that loading the model first, in particular Kapre, will work on a GPU without any further configuration.
model_kapre = openl3.models.load_audio_embedding_model(
input_repr='mel128', content_type='music', embedding_size=512
)
def get_open_embs(batch):
audio_arrays = [song['array'] for song in batch['audio']]
sr_arrays = [song['sampling_rate'] for song in batch['audio']]
embs_list, _ = openl3.get_audio_embedding(audio_arrays, sr_arrays, model=model_kapre)
batch["open_embeddings"] = np.array([embedding.mean(axis=0) for embedding in embs_list])
return batch
music_data = music_data.map(get_open_embs, batched=True, batch_size=20)
music_data
The nice thing about openl3 is that it comes with the best model for our task. The downside is that it is the slowest of the three methods showcased here.
The panns_inference
library is a Python package built on top of PyTorch and torchaudio that
provides an interface for audio tagging and sound event detection tasks. It implements CNN-based
models trained on large-scale audio datasets such as AudioSet and UrbanSound8K. The package was
created to make it easy for researchers and practitioners to use these pre-trained models for
inference on their own audio datasets, without needing to train their own models from scratch. The
panns_inference
library provides a high-level, user-friendly API for loading pre-trained models,
generating embeddings, and performing audio classification tasks in just a few lines of code.
The panns_inference
package requires that the data is either as a numpy array or as a torch
tensor, both of shape [batch, vector]
so let's reshape our song.
audio2 = audio[None, :]
audio2.shape
(1, 1322496)
Bare in mind that this next step, downloading the model, can take quite a bit of time depending on your internet speed. Afterwards, inference is quite fast and the model will return to us two vectors, the timestamps and the embeddings.
at = AudioTagging(checkpoint_path=None, device='cuda')
Checkpoint path: /home/ramonperez/panns_data/Cnn14_mAP=0.431.pth
GPU number: 1
clipwise_output, embedding = at.inference(audio2)
clipwise_output.shape, embedding.shape
((1, 527), (1, 2048))
embedding[0, 470:500]
array([0. , 0. , 0. , 0. , 0. , 0. ,
3.1233616, 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. ,
0. , 1.6375436, 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. ],
dtype=float32)
To get an embedding layer for all of the songs using the panns_inference
package, you can use the following
function. This is the output we will be using for the remainder of the tutorial.
def get_panns_embs(batch):
arrays = [torch.tensor(val['array'], dtype=torch.float64) for val in batch['audio']]
inputs = torch.nn.utils.rnn.pad_sequence(arrays, batch_first=True, padding_value=0).type(torch.cuda.FloatTensor)
_, embedding = at.inference(inputs)
batch['panns_embeddings'] = embedding
return batch
music_data = music_data.map(get_panns_embs, batched=True, batch_size=8)
music_data
Dataset({
features: ['audio', 'index', 'ids', 'panns_embeddings'],
num_rows: 979
})
Transformers are a type of neural network used for natural language processing, but the architecture can also be used for processing audio data by breaking the sound waves into smaller parts and learning how those parts fit together to form meaning.
We can load a pre-trained model from the Hugging Face hub and extract the embeddings from it. Note that this step will give us the worst result of the three since Wav2Vec was trained to recognize speech rather than to classify music genres. Hence, it is important to note that fine-tunning the data with Wav2Vec might not improve a whole lot the quality of the embeddings.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained('facebook/wav2vec2-base').to(device)
feature_extractor = AutoFeatureExtractor.from_pretrained('facebook/wav2vec2-base')
A key step before extracting the features from each song and passing them through the model is to resample the songs 16kHz.
resampled_audio = librosa.resample(y=audio2, orig_sr=sr, target_sr=16_000)
display(player(resampled_audio, rate=16_000))
resampled_audio.shape
inputs = feature_extractor(
resampled_audio[0], sampling_rate=feature_extractor.sampling_rate, return_tensors="pt",
padding=True, return_attention_mask=True, truncation=True, max_length=16_000
).to(device)
inputs['input_values'].shape
torch.Size([1, 16000])
with torch.no_grad():
embeddings = model(**inputs).last_hidden_state.mean(dim=1)
embeddings.shape
torch.Size([1, 768])
To generate the embedding layer for the whole dataset, we can use the following function.
def get_trans_embs(batch):
audio_arrays = [x["array"] for x in batch["audio"]]
inputs = feature_extractor(
audio_arrays, sampling_rate=16_000, return_tensors="pt", padding=True,
return_attention_mask=True, max_length=16_000, truncation=True
).to(device)
with torch.no_grad():
pooled_embeds = model(**inputs).last_hidden_state.mean(dim=1)
return {"transform_embeddings": pooled_embeds.cpu().numpy()}
music_data = music_data.cast_column("audio", Audio(sampling_rate=16_000))
music_data = music_data.map(embed_audio, batched=True, batch_size=20)
music_data
Recommendation systems are algorithms and techniques used to suggest items or content to users based on their preferences, historical data, or behavior. These systems aim to provide personalized recommendations to users, helping them discover new items of interest and enhancing their overall user experience. Recommendation systems are widely used in various domains such as e-commerce, streaming platforms, social media, and more.
Let's start by populating the collection we created earlier. If you picked the transformers approach or openl3 to follow along, you will need to recreate your collection with the appropriate dimension size.
client.upsert(
collection_name=my_collection,
points=models.Batch(
ids=music_data['index'],
vectors=music_data['panns_embeddings'],
payloads=payload
)
)
UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)
We can retrieve any song by its id using client.retrieve()
and then extract the information
in the payload with the .payload
attribute.
result = client.retrieve(
collection_name=my_collection,
ids=[100],
with_vectors=True # we can turn this on and off depending on our needs
)
result[0].payload
{'artist': 'La Bottine Souriante',
'genre': 'latin',
'name': 'Chant de la luette',
'subgenres': ['latin---salsa'],
'urls': '../data/ludwig_music_data/mp3/latin/0lyeChzw7IWf9ytZ7S0jDK.mp3'}
r = librosa.core.load(result[0].payload['urls'], sr=44100, mono=True)
player(r[0], rate=r[1])
You can search for similar songs with the client.search()
method. Let's find and artist and a song
we like and use that id to grab the embedding and search for similar songs.
PS. Here is Celia Cruz. 😎
metadata.query("artist == 'Celia Cruz'")
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
index | ids | artist | genre | name | subgenres | urls | |
---|---|---|---|---|---|---|---|
122 | 122 | 0v1oaOqkXpubdykx58BQwY | Celia Cruz | latin | Juancito Trucupey | [latin---salsa] | ../data/ludwig_music_data/mp3/latin/0v1oaOqkXp... |
150 | 150 | 19zWrDlXew0Fzouu7a4qhx | Celia Cruz | latin | Cuando Sali De Cuba | [latin---salsa] | ../data/ludwig_music_data/mp3/latin/19zWrDlXew... |
178 | 178 | 1MYds6o9aN2Wxa4TDxcJPB | Celia Cruz | latin | Mi vida es cantar | [latin---salsa] | ../data/ludwig_music_data/mp3/latin/1MYds6o9aN... |
459 | 459 | 3WphzI2fb2NTUsfja51U7P | Celia Cruz | latin | Dile que por mi no tema | [latin---salsa] | ../data/ludwig_music_data/mp3/latin/3WphzI2fb2... |
client.search(
collection_name=my_collection,
query_vector=music_data[150]['panns_embeddings'],
limit=10
)
[ScoredPoint(id=150, version=0, score=0.99999994, payload={'artist': 'Celia Cruz', 'genre': 'latin', 'name': 'Cuando Sali De Cuba', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/19zWrDlXew0Fzouu7a4qhx.mp3'}, vector=None),
ScoredPoint(id=730, version=0, score=0.9206133, payload={'artist': 'Cartola', 'genre': 'latin', 'name': 'Fita meus olhos', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/5iyRJ796USPTXEO4JXO0gC.mp3'}, vector=None),
ScoredPoint(id=251, version=0, score=0.9087784, payload={'artist': "Oscar D'León", 'genre': 'latin', 'name': 'Volver a Verte', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1kD5EOoZ45kjq50NLfhRGc.mp3'}, vector=None),
ScoredPoint(id=739, version=0, score=0.90295744, payload={'artist': 'Cartola', 'genre': 'latin', 'name': 'Verde que te quero rosa', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/5plwAx4oAWnuhSwivS5Yeg.mp3'}, vector=None),
ScoredPoint(id=268, version=0, score=0.8995003, payload={'artist': 'Chicha Libre', 'genre': 'latin', 'name': 'La cumbia del zapatero', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1ufmU58QldvKrHuATBb3kU.mp3'}, vector=None),
ScoredPoint(id=766, version=0, score=0.88916755, payload={'artist': 'Ska Cubano', 'genre': 'latin', 'name': 'Tequila', 'subgenres': ['latin---cubano', 'reggae'], 'urls': '../data/ludwig_music_data/mp3/latin/618iBzv4oH2wb0WElQV9ru.mp3'}, vector=None),
ScoredPoint(id=7, version=0, score=0.8882055, payload={'artist': 'Ibrahim Ferrer', 'genre': 'latin', 'name': 'Nuestra Ruca', 'subgenres': ['latin---cubano'], 'urls': '../data/ludwig_music_data/mp3/latin/02vPUwCweGxigItnNf2Jfr.mp3'}, vector=None),
ScoredPoint(id=467, version=0, score=0.88348734, payload={'artist': 'La-33', 'genre': 'latin', 'name': 'Soledad', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/3bpqoOSDwdaK003DPMvDJQ.mp3'}, vector=None),
ScoredPoint(id=388, version=0, score=0.882995, payload={'artist': 'David Byrne', 'genre': 'latin', 'name': 'Loco De Amor', 'subgenres': ['latin---salsa', 'latin---samba', 'rock---pop rock'], 'urls': '../data/ludwig_music_data/mp3/latin/2uJsn2yi8HVZ8qwICHcNSW.mp3'}, vector=None),
ScoredPoint(id=139, version=0, score=0.8820398, payload={'artist': 'Ibrahim Ferrer', 'genre': 'latin', 'name': 'Qué bueno baila usted', 'subgenres': ['latin---cubano'], 'urls': '../data/ludwig_music_data/mp3/latin/16FEEqvnZKcgfA5esxe5kL.mp3'}, vector=None)]
You can evaluate the search results by looking at the score or by listening to the songs and judging how similar they really are. I, the author, can vouch for the quality of the ones we got for Celia Cruz. 😎
The recommendation API works a bit differently, we don't need a vector query but rather the ids of positive (required) vectors and negative (optional) ones, and Qdrant will do the heavy lifting for us.
client.recommend(
collection_name=my_collection,
positive=[178, 122],
limit=5
)
[ScoredPoint(id=384, version=0, score=0.96683824, payload={'artist': 'Gilberto Santa Rosa', 'genre': 'latin', 'name': 'Perdoname', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/2qqrgPaRZow7lrLttDL6Im.mp3'}, vector=None),
ScoredPoint(id=424, version=0, score=0.9633477, payload={'artist': 'Gilberto Santa Rosa', 'genre': 'latin', 'name': 'Amanecer Borincano', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/39FQfusOwKnPCjOgQHcx6S.mp3'}, vector=None),
ScoredPoint(id=190, version=0, score=0.9624174, payload={'artist': 'Luigi Texidor', 'genre': 'latin', 'name': 'Mi Testamento', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/1RIdI5c7RjjagAcMA5ixpv.mp3'}, vector=None),
ScoredPoint(id=92, version=0, score=0.95979774, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Mambo Gozón', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/0hk1gSyn3wKgdxqF6qaKUZ.mp3'}, vector=None),
ScoredPoint(id=886, version=0, score=0.95851713, payload={'artist': 'Tony Vega', 'genre': 'latin', 'name': 'Ella es', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/718X6sjlHdmOzdTfJv4tUc.mp3'}, vector=None)]
Say we don't like Chayanne because his songs are too mushy. We can use the id of one of his mushiest songs so that Qdrant gets us results as far away as possible from such a song.
metadata.query("artist == 'Chayanne'")
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
index | ids | artist | genre | name | subgenres | urls | |
---|---|---|---|---|---|---|---|
162 | 162 | 1EyREvPFfh2TgXFMCPoydD | Chayanne | latin | Caprichosa | [latin---salsa, pop---ballad] | ../data/ludwig_music_data/mp3/latin/1EyREvPFfh... |
208 | 208 | 1XMw83NJw29iwarOqVibos | Chayanne | latin | Querida | [latin---samba, pop---ballad] | ../data/ludwig_music_data/mp3/latin/1XMw83NJw2... |
385 | 385 | 2sKo5u6IppUEudIz265wYa | Chayanne | latin | Yo Te Amo | [latin---salsa, pop---ballad] | ../data/ludwig_music_data/mp3/latin/2sKo5u6Ipp... |
412 | 412 | 34hM4PLlhyBysgL50IWdHf | Chayanne | latin | Y tú te vas | [latin---salsa, pop---ballad] | ../data/ludwig_music_data/mp3/latin/34hM4PLlhy... |
645 | 645 | 4zkOTmiamebLJ39Sqbp7sb | Chayanne | latin | Boom Boom | [latin---salsa, pop---ballad] | ../data/ludwig_music_data/mp3/latin/4zkOTmiame... |
client.recommend(
collection_name=my_collection,
positive=[178, 122],
negative=[385],
limit=5
)
[ScoredPoint(id=546, version=0, score=0.87100524, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'El Preguntón', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4EH5vM8p1Ibvlz5cgZLHvY.mp3'}, vector=None),
ScoredPoint(id=85, version=0, score=0.86223793, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Malembe', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/0efiEWiAFtHrQHTWfeDikg.mp3'}, vector=None),
ScoredPoint(id=910, version=0, score=0.8605486, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Cubanismo Llegó', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/7FSSdHxCoyEMfHUP6NdOb2.mp3'}, vector=None),
ScoredPoint(id=540, version=0, score=0.85953826, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
ScoredPoint(id=812, version=0, score=0.85860175, payload={'artist': 'Tommy Olivencia', 'genre': 'latin', 'name': 'Trucutú', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/6I9OiSVppRGjuAweyBucE2.mp3'}, vector=None)]
Say we want to get recommendations based on a song we just recently listened to and liked, and that the system remembers all of our preferences.
marc_anthony_valio_la_pena = music_data[301]
client.recommend(
collection_name=my_collection,
positive=[marc_anthony_valio_la_pena['idx'], 178, 122, 459],
negative=[385],
limit=5
)
[ScoredPoint(id=546, version=0, score=0.86705625, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'El Preguntón', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4EH5vM8p1Ibvlz5cgZLHvY.mp3'}, vector=None),
ScoredPoint(id=85, version=0, score=0.8635909, payload={'artist': '¡Cubanismo!', 'genre': 'latin', 'name': 'Malembe', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/0efiEWiAFtHrQHTWfeDikg.mp3'}, vector=None),
ScoredPoint(id=540, version=0, score=0.8588973, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
ScoredPoint(id=812, version=0, score=0.85626286, payload={'artist': 'Tommy Olivencia', 'genre': 'latin', 'name': 'Trucutú', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/6I9OiSVppRGjuAweyBucE2.mp3'}, vector=None),
ScoredPoint(id=587, version=0, score=0.85231805, payload={'artist': 'Tito Puente & His Orchestra', 'genre': 'latin', 'name': 'Mambo Gozon', 'subgenres': ['latin---salsa'], 'urls': '../data/ludwig_music_data/mp3/latin/4Sewxyw6EtUldCIz2sD9S5.mp3'}, vector=None)]
Lastly, imagine we want a Samba filter for the recommendations we get, the UI could have tags for us to choose from and Qdrant would do the rest.
samba_songs = models.Filter(
must=[models.FieldCondition(key="subgenres", match=models.MatchAny(any=['latin---samba']))]
)
results = client.recommend(
collection_name=my_collection,
query_filter=samba_songs,
positive=[marc_anthony_valio_la_pena['idx'], 178, 122, 459],
negative=[385],
limit=5
)
results
[ScoredPoint(id=540, version=0, score=0.8588973, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Cual Es La Idea', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/4CNCGwxNp9rnVqo2fzmDYK.mp3'}, vector=None),
ScoredPoint(id=493, version=0, score=0.8236424, payload={'artist': 'Tito Nieves', 'genre': 'latin', 'name': 'De mi enamórate', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/3nnQUYKWBmHlfm5XpdWqNr.mp3'}, vector=None),
ScoredPoint(id=92, version=0, score=0.8120091, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Mambo Gozón', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/0hk1gSyn3wKgdxqF6qaKUZ.mp3'}, vector=None),
ScoredPoint(id=856, version=0, score=0.80171, payload={'artist': 'Tito Puente', 'genre': 'latin', 'name': 'Son de la Loma', 'subgenres': ['latin---samba'], 'urls': '../data/ludwig_music_data/mp3/latin/6c8qeNyZrTB8E3RKdPdNBh.mp3'}, vector=None),
ScoredPoint(id=892, version=0, score=0.7895387, payload={'artist': 'David Byrne', 'genre': 'latin', 'name': 'Make Believe Mambo', 'subgenres': ['latin---salsa', 'latin---samba', 'rock---pop rock'], 'urls': '../data/ludwig_music_data/mp3/latin/74V0PhSWlBtHvBQAMYMgsX.mp3'}, vector=None)]
for result in results:
song, sr = librosa.core.load(result.payload['urls'], sr=44100, mono=True)
display(player(song, rate=sr))
That's it! So, what's next? You should try using different genres (or all of them), creating embeddings for these and then building your own recommendation engine on top of Qdrant. Better yet, you could find your own dataset and build a personalized search engine for the things you like, just make sure you let us know via our discord channel here. 😎
Now that we have covered everything we need, it is time to put it to the test with a UI, and for this, we'll use streamlit.
%%writefile recsys_app.py
from panns_inference import AudioTagging
from qdrant_client import QdrantClient
from pedalboard.io import AudioFile
import streamlit as st
import torch
st.title("Music Recommendation App")
st.markdown("Upload your favorite songs and get a list of recommendations from our database of music.")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
at = AudioTagging(checkpoint_path=None, device=device)
client = QdrantClient("localhost", port=6333)
music_file = st.file_uploader(label="📀 Music file 🎸",)
if music_file:
st.audio(music_file)
with AudioFile(music_file) as f:
a_song = f.read(f.frames)[0][None, :]
clip, emb = at.inference(a_song)
st.markdown("## Semantic Search")
results = client.search(collection_name="music_collection", query_vector=emb[0], limit=4)
for result in results:
st.header(f"Song: {result.payload['name']}")
st.subheader(f"Artist: {result.payload['artist']}")
st.audio(result.payload["urls"])
!streamlit run recsys_app.py