nexthope-nlp

Sentiment Analysis on data breach topic tweets.

Goal

This project aims to conduct sentiment analysis of tweets related to terms "data breach" that hopefully will be useful for data breach detection.

Dataset

We scraped the tweets and doing data cleansing (removing slang/punctuation, adding space, etc) in order to be able to get insight from it. Our datasets can be found in the datasets folder. We also used kamus to do the sentiment analysis.

Approach

We've done two approaches of vectorizer: using CountVectorizer and TF-IDF. Comparing those two performance with chosen classification algorithms (CountVectorizer using Random Forest, SVM and TF-IDF using Random Forest, SVM, and IndoBERT), we got the best result came from IndoBERT.

Vectorizer Method	Model	Accuracy	F1 Score
CountVectorizer	Random Forest	0.78	0.73
CountVectorizer	SVM	0.78	0.72
TF-IDF	Random Forest	0.74	0.67
TF-IDF	SVM	0.79	0.68
TF-IDF	IndoBERT	0.82	0.79

Further informations about files

File Name	Description
databocorv2_part2	raw data from scraping
Scraping_preprocessing_v2	output of notebook v1
colloquial-indonesian-lexicon.csv	indonesian slang words dictionary

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
datasets		datasets
kamus		kamus
utils		utils
count-vectorizer.ipynb		count-vectorizer.ipynb
embuh.ipynb		embuh.ipynb
indobert.ipynb		indobert.ipynb
labeling.ipynb		labeling.ipynb
preprocessing.ipynb		preprocessing.ipynb
readme.md		readme.md
tf-idf.ipynb		tf-idf.ipynb
wordcloud.ipynb		wordcloud.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nexthope-nlp

Goal

Dataset

Approach

Further informations about files

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

lilypon246/nexthope-nlp

Folders and files

Latest commit

History

Repository files navigation

nexthope-nlp

Goal

Dataset

Approach

Further informations about files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages