Skip to content

Advanced predictive model for box office revenue. With precision forecasting and confidence-building insights, our solution empowers production houses to optimize resources and maximize profitability.

License

Notifications You must be signed in to change notification settings

uvaishnav/BoxOfficePrediction

Repository files navigation

🎬 BoxOfficePrediction

Develop an advanced predictive model to forecast a film's box office revenue with precision and confidence. Utilizing a myriad of parameters, including budget, cast, genre, and past performance, our task is to leverage the power of machine learning to unravel the intricacies of box office dynamics and provide actionable insights for studios and filmmakers.

🚀 Motivation

With the extensive data from the TMDB_5000 dataset from Kaggle, numerous recommendation systems are built. However, the true potential of the dataset remains largely untapped. Our initiative aims to harness this wealth of information to predict a film's expected revenue by leveraging a multitude of parameters and innovative feature engineering techniques, ultimately empowering stakeholders to make more informed decisions in the ever-evolving landscape of the entertainment industry.

📄 Documentation

This section contains detailed information about the approach, experimentation results, and inferences derived from the project. I have created a blog explaining the approach and execution. Please visit my blog:

Blog Image

🛠️ Technology Stack

Frontend Backend ML Library MLOps Tools Deployment Version Control
HTML5 Flask Scikit-Learn MLflow Docker GitHub
CSS3 DVC GitHub Actions
JavaScript Heroku

📊 Implementation Overview

Data:

  • TMDB 5000 Movie Dataset => Kaggle
  • Average Ticket Prices => (Made by me) : Download

🔧 Preprocessing:

  • Formatted complex structure to simple and trainable data.
  • Assigned Scores to special categorical features like crew, hero, heroine with many unique values, based on the cumulative popularity and weighted rating of their previous work to numerically determine their impact on revenue/footfall.
  • Used One-hot encoding for normal categorical features with fewer unique values.
  • Used log-normal transformation to handle skewed data and outliers.
  • Normalized data with StandardScaler.

🎯 Target Metric: Footfall Prediction

To predict expected revenue, we introduced a novel approach by considering footfall (number of tickets sold) as a target metric. While revenue is subject to various external factors such as ticket prices and distribution deals, footfall provides a more consistent and direct measure of a movie's popularity and audience engagement.

expected revenue = predicted footfall * current avg_ticket_price

🤖 Model Selection

Models trained:

Model Best Model
RandomForestRegressor
DecisionTreeRegressor
GradientBoostingRegressor
LinearRegression
XGBRegressor XGBRegressor
CatBoostRegressor
AdaBoostRegressor

📈 Best Model Metrics

Metric Value
RMSE 0.012
neg_mean_squared_error -0.00024

⚙️ Best Model Parameters

Parameter Value
colsample_bytree 0.30000000000000004
learning_rate 0.11
max_depth 4
n_estimators 444

🔍 Hyperparameter Tuning

  • Method: RandomizedSearchCV

📑 MLflow Experiment Logs

All the experiment results and models are logged in MLflow for a clearer understanding and detailed inference: View here

📸 Screenshots

Home Page Form Page Result
home page form page result

🖥️ Run Locally

Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y
  conda activate boxoffice

Install requirements

  pip install -r requirements.txt

Start the server

python app.py
Now,
open up you local host and port

🔧 For Usage/Modification

1. Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

2. Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y
  conda activate boxoffice

3. Install requirements

  pip install -r requirements.txt

4. Create a Kaggle Account and get the kaggle.json file and store it in .kaggle folder in your system (For data_ingestion pipeline)

5. Add Environment Variables

For model evaluation pipeline,

  • Connect repository to dagshub
  • Get mlflow uri and credentials
  • UPdate config.yaml file with your mlflow uri
  • Then add these variables(credentials from dagshub) to your environment
export MLFLOW_TRACKING_URI= your mlflow uri
export MLFLOW_TRACKING_USERNAME= your username
export MLFLOW_TRACKING_PASSWORD= your password

6. Run all the pipelines using Dvc

dvc init
dvc repro

🎥 Demo

My.Movie.2-720p30.mov

🚀 Deployment

To Deploy this Project on Heroku

1. Dockerize the Project

Update the Dockerfile as needed and build the Docker image. You need to install Docker Desktop first.

docker build -t boxoffice .

2. Update Secret Variables in GitHub to Deploy Using GitHub Actions

  1. Create an account in heroku and create an app.
  2. In your GitHub repository, navigate to Settings -> Secrets and Variables -> Actions. Add the secret keys according to your main.yaml file in workflow
  • HEROKU_API_KEY
  • HEROKU_APP_NAME
  • HEROKU_EMAIL

The buld will hapen and a new version of your project is deployed every time you make changes and push to github.

📈 Scope of Improvement

Our current model predicts expected revenue based on factors like budget, cast, release month, and genres.

Optimizing Cast Selection and Release Timing

We can enhance its utility by optimizing cast selection and release timing. By analyzing historical data, we can identify optimal combinations of actors and crew members that synergize well, thereby maximizing revenue potential. Additionally, refining our model to recommend the best release windows can help avoid high competition periods and leverage seasonal trends, further boosting a film’s success.

🙏 Acknowledgements

  • TMDB_5000 dataset from Kaggle
  • 247wallst.com for preparing ticket prices dataset

📜 License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

About

Advanced predictive model for box office revenue. With precision forecasting and confidence-building insights, our solution empowers production houses to optimize resources and maximize profitability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published