Movie-genre-prediction

This model predicts movie genres given the plot of the movie. I will use the concepts of machine learning and natural language processing.

Tools used:

Python
ScikitLearn libraries
Numpy
Natural Language Processing
Machine Learning
Flask API

Screenshot:

How to run:

Run the model.py. This will create pickle files for the model.
Then host the model on localhost using PyCharm.
Open the project on PyCharm and run the code.
A localhost link will be generated like(http://127.0.0.1:5000/). Click on this link and the model is ready to run.

Method:

Import the movie dataset into the python notebook.
Import all necessary python libraries.
Separate each genre movie into separate numpy arrays.
Make sure each genre numpy array contains balanced set, which means 50% movies of the respective genre and 50% movies which does not belong to that genre.
Now vectorize each numpy array. We achieved this using tfidfVectorizer.
Now we train each movie genre model using BernoulliNB and predict all the possible genres for the given plot.

Note: Code may seem long but if you look closely the implementation is very short and simple. Since for each genre dataset and models has been trained separately the code looks long.

For each genre we have done the following steps:

Lets assume our genre is Drama.
Prepare movie plots in Drama_data and output of movie plot in Drama_out. Ratio maintained for each movie of the genre to movie not of that genre is 1:2.3. This is the most crucial step of this model and takes much more time than any other step. 1:1.9 is chosen after trial and error method to obtain the optimum accuracy possible.
Preprocess Drama_data. In this step we remove special characters.
Convert both Drama_data and Drama_out to numpy array.
Drama_data is tfidf vectorized and stored in X_Drama. Drama_out is stored to Y_Drama.
Split train and test data sets in ratio of 80% train and 20% test.
Train the Bernoulli Naive Bayes model.
Calculate accuracy of the model using metrics.accuracy_score and test datasets.

Step 8 will give accuracy of each genre with respect to each genre datatype which was obtained in step 2. We have not computed the overall accuracy because of the unbalanced datatype problem.

Accuracies for each genre obtained are:

Drama: 71.42
Comedy: 74.29
Adventure: 77.02
History: 80.15
War: 85.94
Thriller: 77.32
Crime: 78.35
Fantasy: 76.75
Horror: 84.69
Family: 80.46
Documentary: 85.88
Mystery: 75.33
Romance: 74.48
ScienceFiction: 84.51
Action: 80.11

Average accuracy of all genres: 79.11

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
static		static
templates		templates
Action-model.picket		Action-model.picket
Action-tfidf.pkl		Action-tfidf.pkl
Adventure-model.picket		Adventure-model.picket
Adventure-tfidf.pkl		Adventure-tfidf.pkl
Comedy-model.picket		Comedy-model.picket
Comedy-tfidf.pkl		Comedy-tfidf.pkl
Crime-model.picket		Crime-model.picket
Crime-tfidf.pkl		Crime-tfidf.pkl
Documentary-model.picket		Documentary-model.picket
Documentary-tfidf.pkl		Documentary-tfidf.pkl
Drama-model.picket		Drama-model.picket
Drama-tfidf.pkl		Drama-tfidf.pkl
Family-model.picket		Family-model.picket
Family-tfidf.pkl		Family-tfidf.pkl
Fantasy-model.picket		Fantasy-model.picket
Fantasy-tfidf.pkl		Fantasy-tfidf.pkl
History-model.picket		History-model.picket
History-tfidf.pkl		History-tfidf.pkl
Horror-model.picket		Horror-model.picket
Horror-tfidf.pkl		Horror-tfidf.pkl
Mystery-model.picket		Mystery-model.picket
Mystery-tfidf.pkl		Mystery-tfidf.pkl
Procfile		Procfile
README.md		README.md
Romance-model.picket		Romance-model.picket
Romance-tfidf.pkl		Romance-tfidf.pkl
Science Fiction-model.picket		Science Fiction-model.picket
Science Fiction-tfidf.pkl		Science Fiction-tfidf.pkl
Thriller-model.picket		Thriller-model.picket
Thriller-tfidf.pkl		Thriller-tfidf.pkl
War-model.picket		War-model.picket
War-tfidf.pkl		War-tfidf.pkl
app.py		app.py
model.ipynb		model.ipynb
model.py		model.py
movies_metadata.csv		movies_metadata.csv
request.py		request.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie-genre-prediction

About

Releases

Packages

Languages

vaibhavverma9999/Movie-genre-prediction

Folders and files

Latest commit

History

Repository files navigation

Movie-genre-prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages