Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential sigverify latency optimization #81

Open
sakridge opened this issue Sep 15, 2020 · 0 comments
Open

Potential sigverify latency optimization #81

sakridge opened this issue Sep 15, 2020 · 0 comments

Comments

@sakridge
Copy link
Member

The ed25519 sigverify check does the operation a* A + b * B in a single thread. This is somewhat efficient for the CPU because it saves instructions and stack spill to L1 is not as expensive on CPU. On GPU, since there are so many threads, one could do a *A with one kernel launch and in parallel do b * B. At the end, then do the addition which is pretty cheap. Each of those launches would then use a larger portion of the GPU, but in low-batch situations I think this is preferable to letting a large part of the GPU go to waste. Each scalar multiply would also have much less register pressure since it only has half the temps to deal with.

One might even want to have both options available in case the GPU encounters large vs. small batch if one is more efficient than the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant