Skip to content

Project involved the development of a data pipeline using airflow and python. The data pipeline ingested trending movies' and distributors' data from imdb and box office, cleansed, formatted, combined and indexed the data on elastic search. Also, a dashboard was created from the data using kibana analytics. The tools and libraries used in this p…

Notifications You must be signed in to change notification settings

tobiasodion/trending-movies-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trending Movies Data Pipeline

Overview

Screenshot 2024-05-13 at 15 50 38

This project involved the development of a data pipeline for trending movies using Airflow and Python. It involved the following:

  • The data pipeline ingested trending movies' and distributors' data from IMDb and Box Office.
  • The ingested data was cleansed, formatted, processed, and indexed on elastic search.
  • Finally, A dashboard was created from the enriched data using Kibana analytics.

Watch Demo here

Tools & Libraries

The tools and libraries used in this project included:

  • Airflow for automating the pipeline
  • Selenium for data ingestion through web scraping
  • Pandas for data cleansing and formatting
  • Pyspark for data processing
  • Elastic search and kibana.

Dashboards

Screenshot 2024-05-13 at 15 51 05

Total revenue of all the trending movies, Top 5 trending movies by user rating, and distributors' revenue share

Screenshot 2024-05-13 at 15 51 24

Top 5 distributors by revenue

About

Project involved the development of a data pipeline using airflow and python. The data pipeline ingested trending movies' and distributors' data from imdb and box office, cleansed, formatted, combined and indexed the data on elastic search. Also, a dashboard was created from the data using kibana analytics. The tools and libraries used in this p…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages