MODERAT

Justen, Lennart, Kilian Müller, et al. No Time Like the Present: Effects of Temporal Language Change on Comment Moderation. 2022.

(abstract).

The spread of online hate has become a major problem for newspapers that host comment sections. There is growing interest in using machine learning and natural language processing for (semi-) automated abusive language detection to avoid the costs of manual comment moderation or having to shut down comment sections completely. However much of the past work on abusive language detection with ML uses random train-test splitting procedures that assume an unrealistically static language environment. In this paper, we show using a new german newspaper comments dataset that a time-stratified evaluation procedure provides a more realistic measure of a classifier’s performance on future data. We also show that the performance of classifiers can degrade quickly as the training data grows more outdated and language and news coverage evolve. We show that the performance of classifiers trained on data from before the Covid-19 pandemic drops sharply when evaluated on Covid-era comments.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
CHTC		CHTC
Random-vs-stratified test		Random-vs-stratified test
Temporal degredation test		Temporal degredation test
README.md		README.md
TextPreprocessingTransformer.py		TextPreprocessingTransformer.py
corpus_similarity.py		corpus_similarity.py
functions.py		functions.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MODERAT

About

Releases

Packages

Languages

lennijusten/MODERAT

Folders and files

Latest commit

History

Repository files navigation

MODERAT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages