Identification of Spoiler in IMDB movie review

Abstract

This paper presents the NLP (Natural Language Processing) approach to detecting spoilers in the IMDB review. Generally, these reviews reveal some information associated with the plot of a movie. An automated approach, filtering out such spoilers, would be ideal as manual labeling is impossible due to a large amount of content. To identify those reviews, we propose supervised machine learning models. So, we explored Bi-LSTM, XGBoost, Random Forest, and Naive Bayes to improve the accuracy in text classification. In addition to this, we used the pretrained word embeddings(word2vec & Glove), cosine similarity, and Term-Frequency and Inverse Document Frequency (TF-IDF) method to process the text vectors. The results shown from our models are satisfactory. Quantitative and qualitative results demonstrate the proposed method substantially outperforms the baseline model.

Project

'IMDB-NB & XGBoost .ipynb' Implement the feature engineering and modelling of naive bayes, XGboost and sematic similarity method.
'IMDB-word2vec-Bi-lstm.ipynb' Implement the Pretrained Word2vec embedding with Bi-LSTM model.
'IMDB-GloVe-Random Forest.ipynb' Implement the Pretrained Glove embedding to convert sentence to vectors and predicted by Random Forest method.
'IMDB-GloVe-Bi-LSTM.ipynb' Implement the Pretrained Glove embedding with Bi-LSTM model.

Datasets

imdb-spoiler-dataset Dataset obtained from kaggle by RISHABH MISRA

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
732A92-2021-PRA1-mowas455.pdf		732A92-2021-PRA1-mowas455.pdf
IMDB-GloVe-Bi-LSTM.ipynb		IMDB-GloVe-Bi-LSTM.ipynb
IMDB-NB & XGBoost .ipynb		IMDB-NB & XGBoost .ipynb
IMDB-word2vec-Bi-LSTM.ipynb		IMDB-word2vec-Bi-LSTM.ipynb
README.md		README.md
imdb-random-forest-glove.ipynb		imdb-random-forest-glove.ipynb
imdb-review-4.ipynb		imdb-review-4.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identification of Spoiler in IMDB movie review

Abstract

Project

Datasets

Report

About

Releases

Packages

Languages

mowas455/Text_Mining_Project

Folders and files

Latest commit

History

Repository files navigation

Identification of Spoiler in IMDB movie review

Abstract

Project

Datasets

Report

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages