Skip to content

Adem-grp/AppliedDataProgramming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ IMDB 5000 Movie Dataset โ€” Data Cleaning & Exploratory Analysis

This project focuses on cleaning, transforming, and analyzing the IMDB 5000 Movie Dataset to extract meaningful insights about the movie industry โ€” including relationships between budget, revenue, genres, and ratings.


๐Ÿ“Š Project Overview

This notebook demonstrates the complete data wrangling and EDA (Exploratory Data Analysis) process on a real-world dataset. It showcases skills in Pandas, Matplotlib, and Seaborn, covering data cleaning, feature understanding, and visualization.

Key Steps

โ€ข Data Cleaning
โ€“ Handling missing values
โ€“ Detecting and removing duplicates
โ€“ Converting data types
โ€“ Fixing inconsistent or invalid entries

โ€ข Exploratory Analysis
โ€“ Descriptive statistics (mean, median, mode)
โ€“ Genre-wise and director-wise performance
โ€“ Correlation analysis between key numerical features
โ€“ Identifying high-grossing vs. low-grossing films

โ€ข Visualization
โ€“ Revenue vs. Budget scatterplots
โ€“ IMDb rating distributions
โ€“ Top directors and genres by average revenue
โ€“ Correlation heatmaps for key metrics


๐Ÿง  Skills Demonstrated

โ€ข Python: Data structures, logic, functions, and lambda expressions
โ€ข Pandas: Cleaning, transforming, merging, and aggregating data
โ€ข Matplotlib & Seaborn: Plotting trends, distributions, and correlations
โ€ข Analytical Thinking: Asking data-driven questions and validating hypotheses


๐Ÿ“ฆ Requirements

Install required libraries:
pandas, numpy, matplotlib, seaborn, jupyter


๐Ÿ“‚ Dataset

Name: IMDB 5000 Movie Dataset
Source: Kaggle โ€” IMDB 5000 Movie Dataset
Format: CSV
Contains information about 5,000 movies, including:
โ€“ Director names
โ€“ Actor details
โ€“ Budget and gross revenue
โ€“ IMDb score and genres


๐Ÿš€ How to Use

  1. Clone this repository
  2. Open the Jupyter Notebook file named:
    IMDB_5000_Movie_Dataset_Data_Cleaning_&_Exploratory_Analysis_Practice.ipynb
  3. Run all cells sequentially to reproduce the analysis

๐Ÿ“Š Example Outputs

โ€ข Correlation heatmap showing relationships between budget, gross, and rating
โ€ข Bar charts of top directors by average revenue
โ€ข Genre-based performance visualizations

Revenue vs Budget plot
Top Directors chart


๐Ÿ”ฎ Future Work

Potential extensions for this project:
โ€ข Perform feature engineering (extract release year, duration bins, etc.)
โ€ข Apply machine learning to predict movie revenue or IMDb rating
โ€ข Use deep learning (DL) models for text-based features such as plot keywords
โ€ข Build an interactive dashboard using Streamlit or Plotly Dash


๐Ÿท๏ธ License

This project is open-source and available under the MIT License.


โญ If you found this project helpful, please consider giving it a star on GitHub!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors