Sentiment Analysis with PyTorch

The repository will walk you through the process of building a complete Sentiment Analysis model, which will be able to predict a polarity of given review (whether the expressed opinion is positive or negative). The dataset on which the model is going to be trained is popular IMDb movie reviews dataset.

Data preprocessing

The first notebook covers data loading from the raw dataset, feature extraction and analysis, also text preprocessing and train/val/test sets preparation.
Vocabulary and batch iterator

The second tutorial contains instructions on how to set up the vocabulary object that will be responsible for the following tasks:
- Creating dataset's vocabulary.
- Filtering dataset in terms of the rare words occurrence and sentences lengths.
- Mapping words to their numerical representation (word2index) and reverse (index2word).
- Enabling the use of pre-trained word vectors.
Furthermore, we will build the BatchIterator class that could be used for:
- Sorting dataset examples.
- Generating batches.
- Sequence padding.
- Enabling BatchIterator instance to iterate through all batches.
BiGRU model

In the third notebook, the bidirectional Gated Recurrent Unit model will be built. In our neural network we will implement and use the following architectures and techniques: bidirectional GRU, stacked (multi-layer) GRU, dropout/spatial dropout, max-pooling, avg-pooling. The hyperparameters fine-tuning process will be presented. After choosing the proper parameters set, we will train our model and determine the generalization error.
BiGRU with additional features

In this notebook, we will implement the bidirectional Gated Recurrent Unit model that uses features extracted in the first tutorial.
BiGRU with Glove vectors

This notebook covers the implementation of the bidirectional Gated Recurrent Unit model, which uses pre-trained Glove word embeddings together with additional features.
TextCNN

In this notebook, we will build the Convolutional Neural Network model for text classification.
Transformer model for classification

Implementation of the Self-Attention Transformer model for the classification task.

Dataset

Dataset is available under the following link: http://ai.stanford.edu/~amaas/data/sentiment/

Unpack the downloaded tar.gz file using:

tar -xzf acllmdb.tar.gz

Rearrange the data to the following structure:

dataset
  ├── test
  │     ├── positive
  │     ├── negative
  ├── train
        ├── positive
        └── negative

Requirements

Create a virtual environment (conda, virtualenv etc.).

conda create -n <env_name> python=3.7
Activate your environment.

conda activate <env_name>
Create a new kernel.

pip install ipykernel

python -m ipykernel install --user --name <env_name>
Go to the directory: .local/share/jupyter/kernels/<env_name> and ensure that kernel.json file contains the path to your environment python interpreter (can be checked by which python command).

{
 "argv": [
  "home/user/anaconda3/envs/<env_name>/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "<env_name>",
 "language": "python"
}

Install requirements.

pip install -r requirements.txt
Restart your environment.

conda deactivate

conda activate <env_name>

Usage

Inside your virtual environment launch the jupyter notebook, and open the notebook file (with .ipynb extension), then change the kernel to the one created in the preceding step (<env_name>). Now you are ready. Follow me through the tutorial.

Model Performance

Model	Test accuracy	Validation accuracy	Training accuracy
BiGRU	0.880	0.878	0.908
BiGRU with extra features	0.882	0.881	0.898
BiGRU with Glove vectors	0.862	0.862	0.842
TextCNN	0.859	0.847	0.833
Transformer	0.883	0.880	0.912

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
.gitattributes		.gitattributes
1_data_processing.ipynb		1_data_processing.ipynb
2_vocabulary.ipynb		2_vocabulary.ipynb
3_biGRU.ipynb		3_biGRU.ipynb
4_biGRU_with_additional_features.ipynb		4_biGRU_with_additional_features.ipynb
5_biGRU_with_Glove_vectors.ipynb		5_biGRU_with_Glove_vectors.ipynb
6_TextCNN.ipynb		6_TextCNN.ipynb
7_Transformer.ipynb		7_Transformer.ipynb
README.md		README.md
batch_iterator.py		batch_iterator.py
early_stopping.py		early_stopping.py
requirements.txt		requirements.txt
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis with PyTorch

Table of contents

Dataset

Requirements

Usage

Model Performance

References

About

Releases

Packages

Contributors 3

Languages

radoslawkrolikowski/sentiment-analysis-pytorch

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis with PyTorch

Table of contents

Dataset

Requirements

Usage

Model Performance

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages