Skip to content

SentiGerman classifies German text into positive, negative, and neutral sentiments using fine-tuned BERT and RoBERTa models. Trained on the SB10K dataset, it delivers accurate sentiment analysis with advanced language understanding.

Notifications You must be signed in to change notification settings

zahrasafdari/SentiGerman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

German Text Sentiment Classification

Overview

This project focuses on classifying sentiment in German text using two transformer-based models: BERT and RoBERTa. The models are trained and evaluated on the SB10K dataset, which is accessible from here. The goal is to classify text into three sentiment categories: positive, negative, and neutral.

Model Architecture

The project implements the following models:

  • SimpleBERTModel: A BERT-based model for sentiment classification.
  • SimpleRoBERTaModel: A RoBERTa-based model for sentiment classification.

Both models are fine-tuned on the SB10K dataset using PyTorch and the transformers library.

Dataset

The dataset used for this project is the SB10K dataset. It can be downloaded from the following link: SB10K Dataset.

Project Structure

  • SentimentDataset: A custom dataset class for loading and processing the SB10K data.
  • SimpleBERTModel: A custom BERT model for sentiment classification.
  • SimpleRoBERTaModel: A custom RoBERTa model for sentiment classification.

Requirements

  • torch
  • transformers
  • pandas
  • sklearn

You can install the required packages using:

pip install torch transformers pandas scikit-learn
  1. Run the Training: Execute the train.py script to start the training process. This script trains both the BERT and RoBERTa models and evaluates their performance on the test set.
python train.py

Model Diagram

sentiment-diagram drawio (1)

About

SentiGerman classifies German text into positive, negative, and neutral sentiments using fine-tuned BERT and RoBERTa models. Trained on the SB10K dataset, it delivers accurate sentiment analysis with advanced language understanding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published