-
Notifications
You must be signed in to change notification settings - Fork 39
Labels
featureNew feature requestNew feature request
Description
I have reflected on the current sparse logic. I believe that performing skipping calculations within the computation loop may not effectively utilize the Tensor Core. Here are two improvement ideas:
- In the inner loop, an early check is performed to determine if
mask = 1. The calculation part is then divided into two branches: one for normal dense calculation and the other for direct skipping. - In the inner loop, all the
KV blocks that requireintensive calculations are counted, and intensive calculations are performed uniformly on them. Then, the results are written back to the correct positions.
LoserCheems, Copilot and weixuansun
Metadata
Metadata
Assignees
Labels
featureNew feature requestNew feature request