Skip to content

Calculate Softmax layer of Attention in O(LlogL)(L=sequence length) instead of O(L^2) using crpss-polytope Locality-Sensitive Hashing(https://arxiv.org/abs/1802.05751 ).

License

Notifications You must be signed in to change notification settings

somisawa/LSH_Attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

A simple implementation of the LSH Attention in Reformer

Description

Calculate Softmax layer of Attention in $O(L\log L)(L=sequence length)$ instead of $O(L^2)$ using the cross-polytope Locality-Sensitive Hashing. For more detail, look at this paper.

Usage

You only need numpy >=1.18 .

For example,

import numpy as np

from functions import normal_softmax, lsh_softmax

R = np.random.randn(100, 10000)
normal_sm = normal_softmax(R)
lsh_sm = lsh_softmax(R)

Test

Small size

Note: For better visibility, the diagonal components are rewritten to 0

test

Time complexity analysis

The execution times are plotted for sequence lengths of $2^i$($i=4, 5, \cdots , 15$).

time_analysis_log

About

Calculate Softmax layer of Attention in O(LlogL)(L=sequence length) instead of O(L^2) using crpss-polytope Locality-Sensitive Hashing(https://arxiv.org/abs/1802.05751 ).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages