A Dataset for the Detection of Dehumanizing Language

This repository contains both the unlabeled and labeled data for the paper A Dataset for the Detection of Dehumanizing Language (Engelmann et al., LTEDI-WS 2024)

Labeled dataset

The labeled dataset contains the annotations from both annotators. Each row is marked with a 1 in the applicable column depending on if annotators see a sample as dehumanizing / not dehumanizing / unsure. Not applicable columns for a sample are left blank.

The data contains both the annotations before and after the discussion. The first 600 samples were annotated before the discussion, the remaining after. This was marked through additional columns.

Unlabeled dataset

The unlabeled dataset was split into two halves (unlabeled_dataset_first.json and unlabeled_dataset_second.json) to accommodate for data upload size restrictions. After cloning the repository, the unlabeled dataset can be concatenated to retrieve the original data used in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
evaluation_set.csv		evaluation_set.csv
unlabeled_dataset_first.json		unlabeled_dataset_first.json
unlabeled_dataset_second.json		unlabeled_dataset_second.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Dataset for the Detection of Dehumanizing Language

Labeled dataset

Unlabeled dataset

About

Releases

Packages

paen27/DehumanizationDataset

Folders and files

Latest commit

History

Repository files navigation

A Dataset for the Detection of Dehumanizing Language

Labeled dataset

Unlabeled dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages