Skip to content

Commit

Permalink
[SqueezeAttention] SQUEEZEATTENTION: 2D Management of KV-Cache in LLM…
Browse files Browse the repository at this point in the history
… Inference via Layer-wise Optimal Budget(@lzu.edu.cn etc)
  • Loading branch information
DefTruth authored Apr 14, 2024
1 parent 94435c0 commit 25cbc41
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2024.03|🔥🔥[Keyformer] Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference(@ece.ubc.ca etc)|[[pdf]](https://arxiv.org/pdf/2403.09054.pdf)|⚠️|⭐️⭐️ |
|2024.03|[FASTDECODE] FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous(@Tsinghua University)|[[pdf]](https://arxiv.org/pdf/2403.11421.pdf)|⚠️|⭐️⭐️ |
|2024.03|[Sparsity-Aware KV Caching] ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching(@ucf.edu)|[[pdf]](https://arxiv.org/pdf/2403.17312.pdf)|⚠️|⭐️⭐️ |
|2024.04|[SqueezeAttention] SQUEEZEATTENTION: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget(@lzu.edu.cn etc)|[[pdf]](https://arxiv.org/pdf/2404.04793.pdf)|[[SqueezeAttention]](https://github.com/hetailang/SqueezeAttention) ![](https://img.shields.io/github/stars/hetailang/SqueezeAttention.svg?style=social) |⭐️⭐️ |

### 📖Prompt/Context Compression ([©️back👆🏻](#paperlist))
<div id="Context-Compression"></div>
Expand Down

0 comments on commit 25cbc41

Please sign in to comment.