IMDB TV Rating Predictor

METIS Data Science and Machine Learning Bootcamp 2022 by Krystian Krystkowiak

project/month(2/7) focus: REGRESSION

Code - scraping GitHub

Code - cleaning&preprocessing GitHub

Code - eda&regression GitHub

Presentation GitHub

ABSTRACT

Scraped data, used linear regression, lasso, and ridge regularisation to predict the IMDB TV show ratings.
The goal of this project was to use linear regression model to predict IMDB TV show user ratings and identify the factors that influence them. These insights can be useful for TV professionals and investors in making production planning decisions. The model can also be of interest to fans waiting for new shows, as it can help them estimate which shows are most likely to be highly rated.

To achieve this goal, I scraped data from https://www.imdb.com and combined numerical and categorical features in my analysis to create a linear regression model. It seemed that the relationships between the input features and the target variable may not be strong enough to make reliable predictions. Despite this, we were able to identify features that could potentially be more important. I used Matplotlib and Seaborn to visualize and communicate my results.

DESIGN

Scrape data from IMDB website for TV shows released from 2000 to now (2022)
Clean and process the data for use of linear regression model to predict IMDB ratings and identify factors influencing
Visualize and communicate the results using Matplotlib and Seaborn plots

2000-present has been hailed as a new Golden Age of Television. Many people prefer to immerse themselves in the magic of moving pictures from the comfort of their own homes rather than visiting intimidating movie theaters.

Understanding the factors that shape IMDB TV show user ratings could help viewers save time and potentially improve the quality of TV shows produced in the future.

DATA

I scraped the data for this project from IMDb (https://www.imdb.com). The dataset contains 1335 rows with 10 features for each.

Numerical: s01_episodes, avg_runtime, rel_date.
Categorical: genres, MPAA certification, origin, company, creators, stars The data covers the period from 2019-2021 to fucus on modern shows.

ALGORITHMS

Scraped data from https://www.imdb.com
Combined numerical and categorical features in analysis
Used feature engineering techniques such as converting categorical features to dummy variables and searching for interactions
Applied logistic regression, Ridge and Lasso regularization
Split dataset into 75/25 train/test
Evaluated model using R-squared score (0.096/-0.021 R-squared)
Visualized and communicated results using Matplotlib and Seaborn

TOOLS

BeautifulSoup for scraping data from IMDB
Numpy and Pandas for data manipulation and preparation
Scikit-learn and statsmodels for modeling and evaluating performance
Matplotlib and Seaborn for visualizing and communicating the results of the analysis

COMMUNICATION

5-minute slide presentation. The presentation was designed to clearly convey significant factors that influenced IMDB ratings and recommendations.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Project Presentation		Project Presentation
files		files
1. METIS_Krystkowiak_Krystian_Project_2_Regression - scraping.ipynb		1. METIS_Krystkowiak_Krystian_Project_2_Regression - scraping.ipynb
2. METIS_Krystkowiak_Krystian_Project_2_Regression - cleaning&preprocessing.ipynb		2. METIS_Krystkowiak_Krystian_Project_2_Regression - cleaning&preprocessing.ipynb
3. METIS_Krystkowiak_Krystian_Project_2_Regression - eda&regression.ipynb		3. METIS_Krystkowiak_Krystian_Project_2_Regression - eda&regression.ipynb
README.md		README.md
shows_df_final.pkl		shows_df_final.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB TV Rating Predictor

METIS Data Science and Machine Learning Bootcamp 2022 by Krystian Krystkowiak

project/month(2/7) focus: REGRESSION

Code - scraping GitHub

Code - cleaning&preprocessing GitHub

Code - eda&regression GitHub

Presentation GitHub

About

Uh oh!

Releases

Packages

Languages

Krystkowiakk/IMDB-TV-Rating-Predictor

Folders and files

Latest commit

History

Repository files navigation

IMDB TV Rating Predictor

METIS Data Science and Machine Learning Bootcamp 2022 by Krystian Krystkowiak

project/month(2/7) focus: REGRESSION

Code - scraping GitHub

Code - cleaning&preprocessing GitHub

Code - eda&regression GitHub

Presentation GitHub

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages