Optimize sparse logic

I have reflected on the current sparse logic. I believe that performing skipping calculations within the computation loop may not effectively utilize the `Tensor Core`. Here are two improvement ideas:

- In the inner loop, an early check is performed to determine if `mask = 1`. The calculation part is then divided into two branches: one for normal dense calculation and the other for direct skipping.
- In the inner loop, all the `KV blocks that require` intensive calculations are counted, and intensive calculations are performed uniformly on them. Then, the results are written back to the correct positions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize sparse logic #88

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize sparse logic #88

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions