RecoTrax Music Reccomendation Engine

Click here to open RecoTrax App

Site may take some time to load. Please be patient :)

Description

Recotrax is a content based reccomendation system made with python.

It uses the spotify dataset to screen through tens of thousands of songs and provide you with the songs you like. Its reccommendations will be based on the songs and music artists you select.

Technologies Used

Python
Flask
Heroku
Ajax

Python Libraries Used

Basic (Numpy, pandas, json, sys)
pyarrow (make light-weight feather file)
sklearn
- TfidfVectorizer
- cosine_similarity
- MinMaxScaler
flask

How To Use

What to download?

No installation or download required. It is a web app supported on any generally used web browser

What to do?

Pictures are given in accordance to desktop users. Mobile users could easily follow along with the same steps.

When the site loads, you''ll see a page like this.

You will be served with a random list of 16 very popular songs

If you like any song simply click on it.

Your selected song will come up on the upper bar and you will be provided more songs which are similar to your selected song.

The parameters used for finding similarity between songs is discussed well in Working of project section.

More the number of songs you select, better will be the final prediction.

When you have selected enough songs, click the button on the bottom right corner.

The final result will give you the music artists you should listen to.

The result artists is divided into 2 categories.

You have listened to few of their songs. You should check out more of their works
Artists you have never listened to, whom you should definitely check out

Song reccomendations are right on your screen. Any song which is reccommended to you and you didn't click are the ones which carry a similar trait to one or multiple songs you've chosen.

Working of Project

Feature Engineering

After basic feature engineering like removing duplicates and one hot encoding of a few columns of the spotify dataset, we build a complete feature set of all songs with each song having encoded values for genre, popularity and year of release.

TFIDF is used for encoding the genre. You can read about it in References. Using a table is created of shape (170653, 3071).

After a few manipulations with genre combinations and filtering out the rows with very low popularity, the shape of feature set is brought down to (688, 163).

Now, we put the spotify dataset and the feature set into a feather file (CSV file has high load time and memory consumption, thus not used)

If you want to check the feature engineering part. Check the 1Featured.ipynb file in this repo.

We are good to go to the reccommendation part.

Creating Reccommendations

We use 3 functions to carry out the entire creating reccommendation process

First, we need to convert the selected songs into a dataset.

Here, `id_list` refers to the ids of songs chosen by user. `spotify_df` is the entire spotify dataset.

def createPlaylist(id_list, spotify_df):

    idDF = pd.DataFrame({'id':[],'name':[],'artists':[],'url':[],'date_added':[]})
    for id in id_list:
        artistName = spotify_df[spotify_df['id'] == id]['artists_upd_v1'].iloc[0]
        songName = spotify_df[spotify_df['id'] == id]['name'].values[0]
        imageId = spotify_df[spotify_df['id'] == id]['url'].values[0]
        newRow = {'id':id,'name':songName,'artists':artistName,'url':imageId,
        'date_added':pd.to_datetime('2021-04-27 08:09:52+00:00')}
        idDF = idDF.append(newRow,ignore_index=True)

    return idDF

Next, we create a cosine similarity vector with the help of the playlist dataset `playlist_df`. Weight factor is kept `1.09`.

def generate_playlist_feature(complete_feature_set, playlist_df, weight_factor):

    complete_feature_set_playlist = complete_feature_set[complete_feature_set['id'].isin(playlist_df['id'].values)]#.drop('id', axis = 1).mean(axis =0)
    complete_feature_set_playlist = complete_feature_set_playlist.merge(playlist_df[['id','date_added']], on = 'id', how = 'inner')
    complete_feature_set_nonplaylist = complete_feature_set[~complete_feature_set['id'].isin(playlist_df['id'].values)]#.drop('id', axis = 1)
    
    playlist_feature_set = complete_feature_set_playlist.sort_values('date_added',ascending=False)

    most_recent_date = playlist_feature_set.iloc[0,-1]
    
    for ix, row in playlist_feature_set.iterrows():
        playlist_feature_set.loc[ix,'months_from_recent'] = int((most_recent_date.to_pydatetime() - row.iloc[-1].to_pydatetime()).days / 30)
        
    playlist_feature_set['weight'] = playlist_feature_set['months_from_recent'].apply(lambda x: weight_factor ** (-x))
    
    playlist_feature_set_weighted = playlist_feature_set.copy()
    #print(playlist_feature_set_weighted.iloc[:,:-4].columns)
    playlist_feature_set_weighted.update(playlist_feature_set_weighted.iloc[:,:-4].mul(playlist_feature_set_weighted.weight,0))
    playlist_feature_set_weighted_final = playlist_feature_set_weighted.iloc[:, :-4]
    #playlist_feature_set_weighted_final['id'] = playlist_feature_set['id']
    
    return playlist_feature_set_weighted_final.sum(axis = 0), complete_feature_set_nonplaylist

Now, the real deal is this function which uses the cosine similarity vector to find recommendations and returns the top 10 recommendations.

Also, I made a small function `remove_same_tracks()` which removes tracks which are already recommended in the past.

def remove_same_tracks(recos, chosen):
    l = []
    for i in range(50):
        if recos.iloc[i]['id'] in chosen:
            l.append(recos.iloc[i].name)

    recos.drop(l, inplace=True)
    return recos


def generate_playlist_recos(spotify_df, features, feature_set, chosen):

    recos = spotify_df[spotify_df['id'].isin(feature_set['id'].values)]
    recos['sim'] = cosine_similarity(feature_set.drop('id', axis = 1).values, features.values.reshape(1, -1))[:,0]
    recos_top = recos.sort_values('sim',ascending = False).head(50)

    recos_top = remove_same_tracks(recos_top, chosen)
    recos_top = recos_top.drop_duplicates("artists").head(10)
    recos_top['artists'] = recos_top['artists'].apply(ast.literal_eval)
    recos_top['url'] = recos_top['id'].apply(lambda x: spotify_df[spotify_df['id'] == x]['url'].values[0])

    recos_top = recos_top[['id','name','artists','url']]
    
    return recos_top

Building the Flask app

I wouldn't be showing the redundant portion which is same for building any flask app.

If you want to check the full flask app code, check test.py file in my github repo.

First, the feature engineered dataframe (spotify dataset and complete feature set dataset) is loaded

app = Flask(__name__)
spotify_df = pd.read_feather('spotify_df_low.feather')
complete_feature_set = pd.read_feather('cfs_final.feather')
default_date = pd.to_datetime('2021-04-27 08:09:52+00:00')
recos = pd.DataFrame()

When any song card is clicked, it is routed to the `/show` page

@app.route('/show',methods=['POST'])
def show():
	global recos
	id_list = []
	id_list.append(request.form['id'])
	chosen = request.form['chosen'].split(",")

	selected = createPlaylist(id_list,spotify_df,2)
	feature_vector, feature_set = generate_playlist_feature(complete_feature_set, selected, 1.09)
	new_recos = generate_playlist_recos(spotify_df, feature_vector, feature_set, chosen)

	# top song added to new_recos
	recos = new_recos

	id_list = list(recos['id'])
	names = list(recos['name'])
	artists = list(recos['artists'])
	url_list = list(recos['url'])
	
	name_list = ""
	for i in names:
		name_list += i+"$@"
	artist_list = ""
	for i in artists:
		artist_list += ",".join(i)+"$@"


	return jsonify({'id':id_list,'name':name_list,'artist':artist_list,'url':url_list})

When a song is clicked, we dont want the page to reload. Not only will that destroy the user experience, but it will take exponentially more time if we had to render every image and the entire data of page every time user clicks a song card.

The solution is switching to Ajax which will submit our data without reloading page.

$("#main").on('submit', function(event){
	$.ajax({
		url: "/show",
		type: "post",
		data: {
			// These are params sent to show route
			id: $("#id").val(),
			chosen: String(chosen_ids)
		}
	})
	.done(function(data){
		id_list = String(data.id).split(',')
		name_list = String(data.name).split('$@')
		artist_list = String(data.artist).split('$@')
		url_list = String(data.url).split(',')
		n = id_list.length
		for(i=0;i<n;i++){
			cardString = `
				<div class="card">
					<span style="display:none;">`+id_list[i]+`</span>
					<img src=`+url_list[i]+`>
					<h3>`+name_list[i]+`</h3>
					<h4>
					`+artist_list[i].split(',').join('<br>')+`
					</h4>
				</div>
			`
			$("#cards").append(cardString)
		}
		chosen_ids = (chosen_ids + "," + id_list).split(",")
	})
	event.preventDefault();
});

Here, `data` is the json object sent by flask. All the individual data is first extracted from the object.

`cardString` creates an html string for the song card where all the information we got is binded and appended onto the html page.

If you want to check the full js code of ajax, check static/script.js file in my github repo.

References

This project is inspired from the code of Madhav Thaker. His Youtube Channel is also extremely educational.

Big Thanks to his high quality code. Please take your time to check his repo or videos if you want.

Click here to know about TFIDF and why is it appropriate to use it

Click here to know about cosine similarity

Click here to know how to deploy flask app to heroku

Author Info

I am a third year student pursuing B-Tech Degree.

I am a competitive coder and a Data Science enthusiast with a few application based projects.

I recently wrote a blog on Why Data Science is bad for your health. Give it a read if you like.

You can connect with me at Linkedin

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
static		static
templates		templates
.cache		.cache
.cache--f		.cache--f
1Featured.ipynb		1Featured.ipynb
Procfile		Procfile
README.md		README.md
allFunctions.py		allFunctions.py
cfs_final.feather		cfs_final.feather
complete_feature_set.feather		complete_feature_set.feather
mySpotify.ipynb		mySpotify.ipynb
requirements.txt		requirements.txt
spotify_df.feather		spotify_df.feather
spotify_df_low.feather		spotify_df_low.feather
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecoTrax Music Reccomendation Engine

Click here to open RecoTrax App

Table of Contents

Description

Technologies Used

Python Libraries Used

How To Use

What to download?

What to do?

Working of Project

Feature Engineering

Creating Reccommendations

First, we need to convert the selected songs into a dataset.

Here, `id_list` refers to the ids of songs chosen by user. `spotify_df` is the entire spotify dataset.

Next, we create a cosine similarity vector with the help of the playlist dataset `playlist_df`. Weight factor is kept `1.09`.

Now, the real deal is this function which uses the cosine similarity vector to find recommendations and returns the top 10 recommendations.

Also, I made a small function `remove_same_tracks()` which removes tracks which are already recommended in the past.

Building the Flask app

First, the feature engineered dataframe (spotify dataset and complete feature set dataset) is loaded

When any song card is clicked, it is routed to the `/show` page

When a song is clicked, we dont want the page to reload. Not only will that destroy the user experience, but it will take exponentially more time if we had to render every image and the entire data of page every time user clicks a song card.

The solution is switching to Ajax which will submit our data without reloading page.

Here, `data` is the json object sent by flask. All the individual data is first extracted from the object.

`cardString` creates an html string for the song card where all the information we got is binded and appended onto the html page.

References

Author Info

About

Releases

Packages

Languages

auddy99/recoTrax

Folders and files

Latest commit

History

Repository files navigation

RecoTrax Music Reccomendation Engine

Click here to open RecoTrax App

Table of Contents

Description

Technologies Used

Python Libraries Used

How To Use

What to download?

What to do?

Working of Project

Feature Engineering

Creating Reccommendations

First, we need to convert the selected songs into a dataset.

Here, id_list refers to the ids of songs chosen by user. spotify_df is the entire spotify dataset.

Next, we create a cosine similarity vector with the help of the playlist dataset playlist_df. Weight factor is kept 1.09.

Now, the real deal is this function which uses the cosine similarity vector to find recommendations and returns the top 10 recommendations.

Also, I made a small function remove_same_tracks() which removes tracks which are already recommended in the past.

Building the Flask app

First, the feature engineered dataframe (spotify dataset and complete feature set dataset) is loaded

When any song card is clicked, it is routed to the /show page

When a song is clicked, we dont want the page to reload. Not only will that destroy the user experience, but it will take exponentially more time if we had to render every image and the entire data of page every time user clicks a song card.

The solution is switching to Ajax which will submit our data without reloading page.

Here, data is the json object sent by flask. All the individual data is first extracted from the object.

cardString creates an html string for the song card where all the information we got is binded and appended onto the html page.

References

Author Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Here, `id_list` refers to the ids of songs chosen by user. `spotify_df` is the entire spotify dataset.

Next, we create a cosine similarity vector with the help of the playlist dataset `playlist_df`. Weight factor is kept `1.09`.

Also, I made a small function `remove_same_tracks()` which removes tracks which are already recommended in the past.

When any song card is clicked, it is routed to the `/show` page

Here, `data` is the json object sent by flask. All the individual data is first extracted from the object.

`cardString` creates an html string for the song card where all the information we got is binded and appended onto the html page.

Packages