This project focuses on classifying sentiment in German text using two transformer-based models: BERT and RoBERTa. The models are trained and evaluated on the SB10K dataset, which is accessible from here. The goal is to classify text into three sentiment categories: positive, negative, and neutral.
The project implements the following models:
- SimpleBERTModel: A BERT-based model for sentiment classification.
- SimpleRoBERTaModel: A RoBERTa-based model for sentiment classification.
Both models are fine-tuned on the SB10K dataset using PyTorch and the transformers
library.
The dataset used for this project is the SB10K dataset. It can be downloaded from the following link: SB10K Dataset.
SentimentDataset
: A custom dataset class for loading and processing the SB10K data.SimpleBERTModel
: A custom BERT model for sentiment classification.SimpleRoBERTaModel
: A custom RoBERTa model for sentiment classification.
torch
transformers
pandas
sklearn
You can install the required packages using:
pip install torch transformers pandas scikit-learn
- Run the Training: Execute the
train.py
script to start the training process. This script trains both the BERT and RoBERTa models and evaluates their performance on the test set.
python train.py