You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ed25519 sigverify check does the operation a* A + b * B in a single thread. This is somewhat efficient for the CPU because it saves instructions and stack spill to L1 is not as expensive on CPU. On GPU, since there are so many threads, one could do a *A with one kernel launch and in parallel do b * B. At the end, then do the addition which is pretty cheap. Each of those launches would then use a larger portion of the GPU, but in low-batch situations I think this is preferable to letting a large part of the GPU go to waste. Each scalar multiply would also have much less register pressure since it only has half the temps to deal with.
One might even want to have both options available in case the GPU encounters large vs. small batch if one is more efficient than the other.
The text was updated successfully, but these errors were encountered:
The ed25519 sigverify check does the operation
a* A + b * B
in a single thread. This is somewhat efficient for the CPU because it saves instructions and stack spill to L1 is not as expensive on CPU. On GPU, since there are so many threads, one could doa *A
with one kernel launch and in parallel dob * B
. At the end, then do the addition which is pretty cheap. Each of those launches would then use a larger portion of the GPU, but in low-batch situations I think this is preferable to letting a large part of the GPU go to waste. Each scalar multiply would also have much less register pressure since it only has half the temps to deal with.One might even want to have both options available in case the GPU encounters large vs. small batch if one is more efficient than the other.
The text was updated successfully, but these errors were encountered: