GPT-2 Implementation from scratch

A modular implementation of the GPT-2 language model for training and inference

Overview

This project provides a modular implementation of the GPT-2 language model, allowing for easy training and inference. It includes a configurable model architecture, data loading utilities, and scripts for both training and text generation. A few modern optimizations included that weren't in the original GPT-2 paper but are commonly used in implementations:

The use of F.scaled_dot_product_attention with is_causal=True for efficient attention computation.
Some initialization tweaks, like the NANOGPT_SCALE_INIT attribute.

These optimizations don't change the fundamental architecture but can improve training efficiency.

Model Structure

The GPT-2 model implemented in this project follows the architecture described in the original paper "Language Models are Unsupervised Multitask Learners" by Radford et al. The key components of the model are:

Token and Positional Embeddings: Convert input tokens into embeddings and add positional information.
Transformer Blocks: A series of blocks, each containing:
- Multi-Head Attention: Allows the model to attend to different parts of the input sequence.
- Layer Normalization: Normalizes the outputs of the attention and feed-forward layers.
- Feed-Forward Neural Network: Processes the attention output.
Final Layer Normalization: Applied after the last transformer block.
Language Model Head: A linear layer that projects the final hidden states to vocabulary-sized logits.

The model uses the following key classes:

GPT2: The main model class that combines all components.
Block: Represents a single transformer block.
CausalSelfAttention: Implements the multi-head self-attention mechanism with causal masking.
MLP: The feed-forward neural network used in each block.

Project Structure

project/
├── config/
│   └── default_config.yaml
├── src/
│   ├── __init__.py
│   ├── model.py
│   ├── data_loader.py
│   ├── train.py
│   ├── inference.py
│   └── utils.py
├── main.py
├── requirements.txt
└── README.md

Installation

Clone the repository:

git clone https://github.com/yourusername/gpt2-implementation.git
cd gpt2-implementation

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Training

To train the model:

python main.py --config config/default_config.yaml --mode train

This will start the training process using the settings specified in the config file. The script will log training progress and save model checkpoints periodically.

Inference

To generate text using a trained model:

python main.py --config config/default_config.yaml --mode inference --prompt "Your prompt here"

Replace "Your prompt here" with the text you want to use as a starting point for generation.

Configuration

The config/default_config.yaml file contains all the configurable parameters for the model and training process. You can modify this file to change:

Model architecture (e.g., number of layers, embedding size)
Training settings (e.g., batch size, learning rate)
Data source
Logging and checkpoint saving frequency

Here's an example of the configuration structure:

model:
  block_size: 1024
  vocab_size: 50257
  n_layer: 12
  n_head: 12
  n_embd: 768

training:
  num_epochs: 50
  batch_size: 4
  sequence_length: 32
  learning_rate: 3e-4
  device: 'cuda'

data:
  input_file: 'input.txt'

logging:
  log_interval: 10
  save_interval: 1000
  model_save_path: 'checkpoints/'

Author

Yalala Mohit

If you find this project useful, please consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-2 Implementation from scratch

A modular implementation of the GPT-2 language model for training and inference

Table of Contents

Overview

Model Structure

Project Structure

Installation

Usage

Training

Inference

Configuration

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
src		src
README.md		README.md
input.txt		input.txt
main.py		main.py
requirements.txt		requirements.txt

mldlwizard/GPT2-implementation

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Implementation from scratch

A modular implementation of the GPT-2 language model for training and inference

Table of Contents

Overview

Model Structure

Project Structure

Installation

Usage

Training

Inference

Configuration

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages