ETL

Background

We have created an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. We wrote a function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data (Extraction) and performed the Transformation process by cleaning and merging the data as we need and we Load the data to a PostgreSQL database.

What We Are Creating

This project consists of four technical analysis deliverables. We will submit the following:

Deliverable 1: Write an ETL Function to Read Three Data Files
Click the link to view the code of Deliverable 1
Deliverable 2: Extract and Transform the Wikipedia Data
Click the link to view the code of Deliverable 2
Deliverable 3: Extract and Transform the Kaggle data
Click the link to view the code of Deliverable 3
Deliverable 4: Create the Movie Database

Note for the reader:
In this project we are Extracting ,Transforming and Loading the data using Jupyter Notebook,Postgresql
Data extracted from wikimovies , kaggle are used as inputs ,output data stored in postgresql as two tables
The input file ratings.csv has 26x10^6 data entries if you open it in excel you can see only 14X10^6 since excel can hold only that much data
Make sure to check the size of the file after downloading and storing which can prevent mistakes .

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Resources		Resources
README.md		README.md
deliverable2module8.ipynb		deliverable2module8.ipynb
deliverable3submissioncopy.ipynb		deliverable3submissioncopy.ipynb
deliverable4modu8sub.ipynb		deliverable4modu8sub.ipynb
movies_metadata.csv.zip		movies_metadata.csv.zip
submission1.ipynb		submission1.ipynb
wikipedia-movies.json		wikipedia-movies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL

Background

What We Are Creating

About

Releases

Packages

Languages

ramyasnl/ETLmodule8-

Folders and files

Latest commit

History

Repository files navigation

ETL

Background

What We Are Creating

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages