Aim of the Project: In today’s world social media has become an integral part of life. Dealing with toxicity online and curbing harassment has been a growing problem since social media and online conversations have become a part of everyday life. It is almost impossible to engage in online conversations without witnessing toxic behavior like unwanted harassment or disrespect.
The aim of the project is to categorize the toxic comments based on the types of toxicity. Examples of toxicity types can be toxic, severely toxic, obscene, threat, insult, identity hate. This is a Multi Label Classification problem which means that a given comment may belong to more than one category at the same time.
Language and Libraries used:
Python 3.7
Numpy
Pandas
Matplotlib
NLTK
Seaborn
Dataset used can be downloaded from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
Steps involved
Getting the dataset
Getting insights from dataset using visualisation tools.
Preprocessing the data using NLTK.
Applying Multi Label classification algorithms.
Comparing the results and choosing the best among them.