Skip to content

Count proportion of troll comments on the guardian's website according to topics

Notifications You must be signed in to change notification settings

sanhitamj/topic_troll_ratio

Repository files navigation

Topic Troll Ratio

aka Measure how many comments are relevant to the topic

Count proportion of troll comments on the guardian's website according to topics

This project began when a friend discussed with me this article from the Guardian - The dark side of Guardian comments

From the summary published in the article -

[T]he Guardian commissioned research into the 70m comments left on its site since 2006 and discovered that of the 10 most abused writers eight are women, and the two men are black. Hear from three of those writers, explore the data and help us host better conversations online

It also says that comments on topics like feminism, rape and Israel-Palestine conflict invite a lot of abusive comments.

Also there is another conversation about how we live in our own echo chambers, as a society we are getting more and more polarised. To see something like this, just counting the number of comments would not be sufficient. If only a handful people are making extreme statements (in acceptable language), we should be tolerant enough to ignore such remarks.

The ideas behind this project are -

  • Sociological - Find out the comments that resonate with the original article and those that are exactly opposite. Count the number of upvotes each of the type of comments get. That should be more telling about how polarised we are.

  • Technology - Teach myself a number of technolgies used in Data Science and Machine Learning. Or this project uses the following resources -

    • Python
    • Web-scraping - urllib, requests, beautifulsoup
    • Database - MongoDB, pymongo, DynamoDB on AWS
    • NLP - NLTK, scikit-learn (sklearn)
    • Miscellaneous - Boto, Jupyter Notebook, git

About

Count proportion of troll comments on the guardian's website according to topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published