sparse-transformer

Sparse Transformers in PyTorch: limited attention span and projection onto a smaller space

Linformer paper: https://arxiv.org/abs/2006.04768

Limited attention span transformers: simply limits maximum attention distance, using sparse tensors. Note: sparse tensors are WIP in PyTorch so this may not work with all versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

sparse-transformer

Files

README.md

Latest commit

History

README.md

File metadata and controls

sparse-transformer