Skip to content

Optimize sparse logic #88

@LoserCheems

Description

@LoserCheems

I have reflected on the current sparse logic. I believe that performing skipping calculations within the computation loop may not effectively utilize the Tensor Core. Here are two improvement ideas:

  • In the inner loop, an early check is performed to determine if mask = 1. The calculation part is then divided into two branches: one for normal dense calculation and the other for direct skipping.
  • In the inner loop, all the KV blocks that require intensive calculations are counted, and intensive calculations are performed uniformly on them. Then, the results are written back to the correct positions.

Metadata

Metadata

Labels

featureNew feature request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions