Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 345 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 345 Bytes

sparse-transformer

Sparse Transformers in PyTorch: limited attention span and projection onto a smaller space

Linformer paper: https://arxiv.org/abs/2006.04768

Limited attention span transformers: simply limits maximum attention distance, using sparse tensors. Note: sparse tensors are WIP in PyTorch so this may not work with all versions.