Skip to content

Three functions that are combined to pull data from a source and place it in a destination database by doing appropriate transformation. Here the data from Wikipedia movielist , Kaggle movies metadata,ratings.csv are the data source and the data is Transformed by cleaning and combining them and finally it is stored in a Postgres Database.

Notifications You must be signed in to change notification settings

ramyasnl/ETLmodule8-

Repository files navigation

ETL

Background

We have created an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. We wrote a function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data (Extraction) and performed the Transformation process by cleaning and merging the data as we need and we Load the data to a PostgreSQL database.

What We Are Creating

image

This project consists of four technical analysis deliverables. We will submit the following:

Deliverable 1: Write an ETL Function to Read Three Data Files
Click the link to view the code of Deliverable 1
Deliverable 2: Extract and Transform the Wikipedia Data
Click the link to view the code of Deliverable 2
Deliverable 3: Extract and Transform the Kaggle data
Click the link to view the code of Deliverable 3
Deliverable 4: Create the Movie Database
Click the link to view the Movie Datbase

Note for the reader:
In this project we are Extracting ,Transforming and Loading the data using Jupyter Notebook,Postgresql
Data extracted from wikimovies , kaggle are used as inputs ,output data stored in postgresql as two tables
The input file ratings.csv has 26x10^6 data entries if you open it in excel you can see only 14X10^6 since excel can hold only that much data
Make sure to check the size of the file after downloading and storing which can prevent mistakes .

About

Three functions that are combined to pull data from a source and place it in a destination database by doing appropriate transformation. Here the data from Wikipedia movielist , Kaggle movies metadata,ratings.csv are the data source and the data is Transformed by cleaning and combining them and finally it is stored in a Postgres Database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published