Skip to content

Commit

Permalink
🔥🔥[Quest] Quest: Query-Aware Sparsity for Efficient Long-Context LLM …
Browse files Browse the repository at this point in the history
…Inference(@mit-han-lab etc)
  • Loading branch information
DefTruth authored Jul 13, 2024
1 parent 5b83541 commit 20aba1d
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,8 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2024.06|🔥[LOOK-M] LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference(@osu.edu etc)| [[pdf]](https://arxiv.org/pdf/2406.18139) | [[LOOK-M]](https://github.com/SUSTechBruce/LOOK-M) ![](https://img.shields.io/github/stars/SUSTechBruce/LOOK-M.svg?style=social) |⭐️⭐️ |
|2024.06|🔥🔥[**MInference**] MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention(@Microsoft etc)| [[pdf]](https://arxiv.org/pdf/2407.02490) | [[MInference]](https://github.com/microsoft/MInference) ![](https://img.shields.io/github/stars/microsoft/MInference.svg?style=social) |⭐️⭐️ |
|2024.06|🔥🔥[**InfiniGen**] InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management(@snu) | [[pdf]](https://arxiv.org/pdf/2406.19707) | ⚠️ |⭐️⭐️ |
|2024.06|🔥🔥[**Quest**] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference(@mit-han-lab etc) | [[pdf]](https://arxiv.org/pdf/2406.10774)| [[Quest]](https://github.com/mit-han-lab/Quest) ![](https://img.shields.io/github/stars/mit-han-lab/Quest.svg?style=social) |⭐️⭐️ |


### 📖Early-Exit/Intermediate Layer Decoding ([©️back👆🏻](#paperlist))
<div id="Early-Exit"></div>
Expand Down

0 comments on commit 20aba1d

Please sign in to comment.