Sparse Transformer Implementation

This repository contains a PyTorch implementation of a Transformer model with sparse attention patterns. The goal is to explore and implement various sparse attention mechanisms to improve the efficiency of transformer models while maintaining performance.

Current Features

Local sparse attention mechanism (window-based)
Configurable model architecture (layers, heads, dimensions)
Basic positional encoding
Simple training loop for sequence prediction
CPU support

Installation

# Clone the repository
git clone https://github.com/Sunsvea/cl-sparse-transformer.git
cd cl-sparse-transformer

# Install dependencies
pip install torch

Quick Start

# Run the example training script
python sparse_transformer.py

Output

When running the training script, you should see output similar to this:

2025-01-31 10:15:23,456 - INFO - Starting training...
2025-01-31 10:15:23,789 - INFO - Generated 1000 sample sequences...
2025-01-31 10:15:23,901 - INFO - Split data into 800 train and 200 validation sequences

2025-01-31 10:15:24,123 - INFO - Epoch 1/5
2025-01-31 10:15:24,456 - INFO - Batch 0, Loss: 4.6573
2025-01-31 10:15:24,789 - INFO - Batch 10, Loss: 4.3291
2025-01-31 10:15:25,012 - INFO - Training Loss: 4.2845
2025-01-31 10:15:25,234 - INFO - Validation Loss: 4.1932
2025-01-31 10:15:25,345 - INFO - Saved new best model checkpoint
2025-01-31 10:15:25,456 - INFO - Epoch completed in 1.33s

[...]

2025-01-31 10:15:35,678 - INFO - Training completed in 12.22s
2025-01-31 10:15:35,789 - INFO - Best validation loss: 3.2456

The model saves checkpoints to ./checkpoints/ whenever the validation loss improves.

Architecture

The current implementation includes:

LocalSparseAttention: Implements window-based sparse attention where each token attends only to its neighbors
SparseTransformerBlock: A single transformer block with sparse attention
SparseTransformer: The full model with embedding layer and multiple transformer blocks

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Citation

If you use this code in your research, please cite:

@software{sparse_transformer2025,
  author = {Dean Coulstock},
  title = {Sparse Transformer Implementation},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/Sunsvea/sparse-transformer}
}

Contact

Dean Coulstock
[email protected]
LinkedIn: https://www.linkedin.com/in/dean-coulstock/

Acknowledgments

This implementation draws inspiration from:

"Generating Long Sequences with Sparse Transformers" (Child et al., 2019)
"Longformer: The Long-Document Transformer" (Beltagy et al., 2020)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
sparse_transformer.py		sparse_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparse Transformer Implementation

Current Features

Installation

Quick Start

Output

Architecture

Contributing

License

Citation

Contact

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

License

Sunsvea/cl-sparse-transformer

Folders and files

Latest commit

History

Repository files navigation

Sparse Transformer Implementation

Current Features

Installation

Quick Start

Output

Architecture

Contributing

License

Citation

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages