Language Modeling with LSTM and GRU on Penn Tree Bank (PTB)

This project implements and trains LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) networks on the Penn Tree Bank (PTB) dataset using PyTorch. The study compares different model configurations to evaluate their effectiveness in language modeling tasks.

Dataset

The Penn Tree Bank (PTB) dataset is utilized for training, validating, and testing the language models. It consists of roughly:

Training Set: 929,589 words
Validation Set: 73,760 words
Test Set: 82,430 words

The dataset includes sequences of word tokens, each represented by an index from a vocabulary of 10,000 most frequent words.

Model Architectures

LSTM (Long Short-Term Memory)

The LSTM model is designed to capture long-term dependencies in sequential data. Its architecture includes:

Embedding Layer: Converts token indices into 200-dimensional vectors.
LSTM Layers: Two fully-connected LSTM layers, each with 200 hidden units.
Dropout Layer: Applied after the embedding layer and in hidden LSTM layers with configurable dropout probability to prevent overfitting.
Fully Connected Layer: Maps LSTM outputs to the vocabulary size (10,000 unique tokens), producing logits for each token prediction.

GRU (Gated Recurrent Unit)

The GRU model offers a streamlined alternative to LSTM with fewer parameters:

Embedding Layer: Similar to LSTM, it transforms token indices into 200-dimensional vectors.
GRU Layers: Two fully-connected GRU layers with 200 hidden units each.
Dropout Layer: Applied post-embedding and in GRU hidden layers with configurable dropout probability.
Fully Connected Layer: Maps LSTM outputs to the vocabulary size (10,000 unique tokens), producing logits for each token prediction.

Model Variants

Four experimental configurations were evaluated to assess the impact of regularization techniques:

LSTM without Dropout
- Architecture: Standard LSTM with no dropout.
- Purpose: Baseline to understand model performance.
LSTM with Dropout
- Architecture: LSTM with a dropout probability of 0.5 applied after the embedding layer and in hidden layers.
- Purpose: Evaluate dropout's role in preventing overfitting and generalization.
GRU without Dropout
- Architecture: Standard GRU with no dropout.
- Purpose: Baseline to understand model performance.
GRU with Dropout
- Architecture: GRU with a dropout probability of 0.3 applied after the embedding layer.
- Purpose: Evaluate dropout's role in preventing overfitting and generalization.

Training Hyperparameters

Hyperparameter	Value
Batch Size	20
Sequence Length	20
Vocabulary Size	10,000
Embedding Size	200
Hidden Size	200
Number of Layers	2
Dropout Probability	0.0 / 0.5 / 0.3
Learning Rate	1.6 / 3.4 / 1.5 / 1.8
Optimizer	SGD
Learning Rate Scheduler	LambdaLR
Number of Epochs	13 / 20

Training Loop

For each epoch:

Training Phase:
- Forward pass through the model.
- Compute loss and perform backpropagation.
- Update model parameters using SGD.
Validation Phase:
- Evaluate model performance on the validation set.
- Calculate perplexity to assess language modeling capability.
Testing Phase:
- After training completion, evaluate the best model on the test set.
Checkpointing:
- Save model state when validation perplexity improves.
Learning Rate Adjustment:
- Update learning rate according to the scheduler.
Logging and Visualization:
- Record perplexity scores.
- Generate and save perplexity plots for analysis.
- Generate and save table for best perplexities.

Results

Results Summary

Model	Dropout Probability	Train Perplexity	Validation Perplexity	Test Perplexity
LSTM	0.0	69.39	122.04	118.74
LSTM	0.5	105.38	109.08	105.22
GRU	0.0	51.11	121.48	117.32
GRU	0.3	72.83	104.29	100.62

Conclusions

Base Models (LSTM and GRU without Dropout):
- Achieved lower training perplexities but higher validation and test perplexities, indicating overfitting.
Regularized Models (with Dropout):
- Higher training perplexities but significantly improved validation and test perplexities, demonstrating enhanced generalization.
GRU vs. LSTM:
- GRU models trained faster, likely due to their simpler architecture. They also show better performance on the validation and test set when compared to LSTMs.
Dropout Effectiveness:
- Dropout effectively reduced overfitting, as evidenced by better performance on validation and test sets across both architectures.

How to Train and Test

Training and Evaluating the Model: To train and evaluate the models, the final block of code should be executed within the provided notebook. This block initializes four different model configurations and runs a training and testing loop for each one, over a predefined number of epochs. Once this block is run, each model will undergo training for the specified number of epochs. During each epoch, the code evaluates the model on both the training, validation, and test datasets to monitor performance and generalization. The best model for each configuration is saved based on test accuracy, and the results are used to generate convergence graphs and a final accuracy comparison table.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
DL_assignment2.ipynb		DL_assignment2.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Modeling with LSTM and GRU on Penn Tree Bank (PTB)

Table of Contents

Dataset

Model Architectures

LSTM (Long Short-Term Memory)

GRU (Gated Recurrent Unit)

Model Variants

Training Hyperparameters

Training Loop

Results

Results Summary

Conclusions

How to Train and Test

About

Releases

Packages

Contributors 2

Languages

License

rachnaumesh/Language-Modeling-on-PennTreeBank

Folders and files

Latest commit

History

Repository files navigation

Language Modeling with LSTM and GRU on Penn Tree Bank (PTB)

Table of Contents

Dataset

Model Architectures

LSTM (Long Short-Term Memory)

GRU (Gated Recurrent Unit)

Model Variants

Training Hyperparameters

Training Loop

Results

Results Summary

Conclusions

How to Train and Test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages