Note:

This repo is moved to: https://github.com/hausanlp/NaijaSenti.

NaijaSenti dataset can also be found on HugginFace: https://huggingface.co/datasets/HausaNLP/NaijaSenti-Twitter

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages. This project was supported by lacuna-fund initiatives. Jump straight to one of the sections below, or just scroll down to find out more.

Update (05/09/2022): We are running a SemEval competition and we release more sentiment dataset from African languages including NaiJaSenti Dataset. Visit the AfriSenti SemEval page for more information : AfriSenti-SemEval Task 12

Update (05/09/2022): Send me email (shamsuddeen2004@gmail.com) if you need NaijaSenti Dataset. We can send you anonymized dataset.

anonomize

Table of Contents

Paper and Datasheet for Dataset

Read the NaijaSenti paper: NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
Read the NaijaSenti Datasheet coming soon...

Abstract

Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria—Hausa, Igbo, Nigerian-Pidgin, and Yorùbá—consisting of around 30,000 annotated tweets per language (except for Nigerian-Pidgin), including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labelling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We make the datasets, trained models, sentiment lexicons, and code available to encourage sentiment analysis research in under-represented languages.

Download NaijaSenti Datasets

1. Manually Annotated Twitter Sentiment Dataset

2. Manually Annotated Sentiment Lexicon

3. Semi-automatically Translated emotion lexicon

4. Semi-automatically Translated sentiment lexicon

5. Large Scale Unlabled Twitter Sentiment Corpus

6. Stop-words for Hausa, Igbo, Pidgin and Yoruba

Model

Our model is available via Hugginface Model Hub here

Citation

If you use this data in your work, please cite:

@misc{muhammad2022naijasenti,
      title={NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis}, 
      author={Shamsuddeen Hassan Muhammad and David Ifeoluwa Adelani and Sebastian Ruder and Ibrahim Said Ahmad and Idris Abdulmumin and Bello Shehu Bello and Monojit Choudhury and Chris Chinenye Emezue and Saheed Salahudeen Abdullahi and Anuoluwapo Aremu and Alipio Jeorge and Pavel Brazdil},
      year={2022},
      eprint={2201.08277},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Papers from this project

Please, let us know if you use NaijaSenti in your papers.

Contact us

If you want to report a problem or suggest an enhancement we'd love for you to open an issue at this github repository because then we can get right on it. But you can also contact us by email (shamsuddeen2004 AT gmail DOT com).

Changelog

2022-01-21: Released NaijaSenti v1.0.0

License

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Note:

Update (05/09/2022): We are running a SemEval competition and we release more sentiment dataset from African languages including NaiJaSenti Dataset. Visit the AfriSenti SemEval page for more information : AfriSenti-SemEval Task 12

Update (05/09/2022): Send me email ([email protected]) if you need NaijaSenti Dataset. We can send you anonymized dataset.

Table of Contents

Paper and Datasheet for Dataset

Abstract

Download NaijaSenti Datasets

1. Manually Annotated Twitter Sentiment Dataset

2. Manually Annotated Sentiment Lexicon

3. Semi-automatically Translated emotion lexicon

4. Semi-automatically Translated sentiment lexicon

5. Large Scale Unlabled Twitter Sentiment Corpus

6. Stop-words for Hausa, Igbo, Pidgin and Yoruba

Model

Citation

Papers from this project

Contact us

Changelog

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Note:

Update (05/09/2022): We are running a SemEval competition and we release more sentiment dataset from African languages including NaiJaSenti Dataset. Visit the AfriSenti SemEval page for more information : AfriSenti-SemEval Task 12

Update (05/09/2022): Send me email ([email protected]) if you need NaijaSenti Dataset. We can send you anonymized dataset.

Table of Contents

Paper and Datasheet for Dataset

Abstract

Download NaijaSenti Datasets

1. Manually Annotated Twitter Sentiment Dataset

2. Manually Annotated Sentiment Lexicon

3. Semi-automatically Translated emotion lexicon

4. Semi-automatically Translated sentiment lexicon

5. Large Scale Unlabled Twitter Sentiment Corpus

6. Stop-words for Hausa, Igbo, Pidgin and Yoruba

Model

Citation

Papers from this project

Contact us

Changelog

License