Skip to content

Sunsvea/cl-sparse-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse Transformer Implementation

This repository contains a PyTorch implementation of a Transformer model with sparse attention patterns. The goal is to explore and implement various sparse attention mechanisms to improve the efficiency of transformer models while maintaining performance.

Current Features

  • Local sparse attention mechanism (window-based)
  • Configurable model architecture (layers, heads, dimensions)
  • Basic positional encoding
  • Simple training loop for sequence prediction
  • CPU support

Installation

# Clone the repository
git clone https://github.com/Sunsvea/cl-sparse-transformer.git
cd cl-sparse-transformer

# Install dependencies
pip install torch

Quick Start

# Run the example training script
python sparse_transformer.py

Output

When running the training script, you should see output similar to this:

2025-01-31 10:15:23,456 - INFO - Starting training...
2025-01-31 10:15:23,789 - INFO - Generated 1000 sample sequences...
2025-01-31 10:15:23,901 - INFO - Split data into 800 train and 200 validation sequences

2025-01-31 10:15:24,123 - INFO - Epoch 1/5
2025-01-31 10:15:24,456 - INFO - Batch 0, Loss: 4.6573
2025-01-31 10:15:24,789 - INFO - Batch 10, Loss: 4.3291
2025-01-31 10:15:25,012 - INFO - Training Loss: 4.2845
2025-01-31 10:15:25,234 - INFO - Validation Loss: 4.1932
2025-01-31 10:15:25,345 - INFO - Saved new best model checkpoint
2025-01-31 10:15:25,456 - INFO - Epoch completed in 1.33s

[...]

2025-01-31 10:15:35,678 - INFO - Training completed in 12.22s
2025-01-31 10:15:35,789 - INFO - Best validation loss: 3.2456

The model saves checkpoints to ./checkpoints/ whenever the validation loss improves.

Architecture

The current implementation includes:

  • LocalSparseAttention: Implements window-based sparse attention where each token attends only to its neighbors
  • SparseTransformerBlock: A single transformer block with sparse attention
  • SparseTransformer: The full model with embedding layer and multiple transformer blocks

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Citation

If you use this code in your research, please cite:

@software{sparse_transformer2025,
  author = {Dean Coulstock},
  title = {Sparse Transformer Implementation},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/Sunsvea/sparse-transformer}
}

Contact

Acknowledgments

This implementation draws inspiration from:

  • "Generating Long Sequences with Sparse Transformers" (Child et al., 2019)
  • "Longformer: The Long-Document Transformer" (Beltagy et al., 2020)

About

A sparse transformer implementation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages