Skip to content

mowas455/Text_Mining_Project

Repository files navigation

Identification of Spoiler in IMDB movie review

Abstract


This paper presents the NLP (Natural Language Processing) approach to detecting spoilers in the IMDB review. Generally, these reviews reveal some information associated with the plot of a movie. An automated approach, filtering out such spoilers, would be ideal as manual labeling is impossible due to a large amount of content. To identify those reviews, we propose supervised machine learning models. So, we explored Bi-LSTM, XGBoost, Random Forest, and Naive Bayes to improve the accuracy in text classification. In addition to this, we used the pretrained word embeddings(word2vec & Glove), cosine similarity, and Term-Frequency and Inverse Document Frequency (TF-IDF) method to process the text vectors. The results shown from our models are satisfactory. Quantitative and qualitative results demonstrate the proposed method substantially outperforms the baseline model.

Project


Datasets


Report


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published