This project aims to build a data model that makes possible a music streaming app understands the user's activities and behavior.
The data model must have fast responses, be easy to understand and trustworthy. In order to reach these aims, this project read and transforms the original data using python and implements a Star Schema approach using PostgreSQL.
The original dataset is composite for two different kinds of files:
- Song Dataset
- Log Dataset
The song dataset is in JSON format and contains metadata about a song and the artist of that song.
Below you can see an example of how song file looks like:
{"num_songs": 1, "artist_id": "ARJIE2Y1187B994AB7", "artist_latitude": null, "artist_longitude": null, "artist_location": "", "artist_name": "Line Renaud", "song_id": "SOUPIRU12A6D4FA1E1", "title": "Der Kleine Dompfaff", "duration": 152.92036, "year": 0}
The second dataset consists of log files in JSON format. The log file contains the activity logs from a music streaming app.
Before to insert data in tables, is important to make sure about data quality. This project uses the pandas library to check missing values and convert data types when necessary.
songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent
user_id, first_name, last_name, gender, level
song_id, title, artist_id, year, duration
artist_id, name, location, lattitude, longitude
start_time, hour, day, week, month, year, weekday
user_agent, browser, browser_version, os, os_version, mobile
$ python create_tables.py
$ python etl.py
$ jupyter notebook
The Code is written in Python 3.6.3 . If you don't have Python installed you can find it here. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip.
To install pip run in the command Line:
$ python -m ensurepip -- default-pip
To upgrade pip:
$ python -m pip install -- upgrade pip setuptools wheel
To upgrade Python:
$ pip install python -- upgrade
Additional Packages that are required are: Jupyter, Psycopg2, Pandas, User_agents, and Bokeh.
You can donwload them using pip
Jupyter:
$ pip install jupyter
Psycopg2:
$ pip install psycopg2
Pandas:
$ pip install pandas
User_agents:
$ pip install install pyyaml ua-parser user-agents
Bokeh:
$ pip install bokeh