Skip to content

Commit

Permalink
🔥🔥[DeFT] DeFT: Decoding with Flash Tree-Attention for Efficient Tree-…
Browse files Browse the repository at this point in the history
…structured LLM Inference(@westlake University etc)
  • Loading branch information
DefTruth authored Jun 3, 2024
1 parent 14576ca commit 49ade18
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2023.12|[SCCA] SCCA: Shifted Cross Chunk Attention for long contextual semantic expansion(@Beihang University)| [[pdf]](https://arxiv.org/pdf/2312.07305.pdf) | ⚠️ |⭐️ |
|2023.12|🔥[**FlashLLM**] LLM in a flash: Efficient Large Language Model Inference with Limited Memory(@Apple)| [[pdf]](https://arxiv.org/pdf/2312.11514.pdf) | ⚠️ |⭐️⭐️ |
|2024.03|🔥🔥[CHAI] CHAI: Clustered Head Attention for Efficient LLM Inference(@cs.wisc.edu etc)| [[pdf]](https://arxiv.org/pdf/2403.08058.pdf) | ⚠️ |⭐️⭐️ |
|2024.04| [Flash Tree Attention] DEFT: FLASH TREE-ATTENTION WITH IO-AWARENESS FOR EFFICIENT TREE-SEARCH-BASED LLM INFERENCE(@Westlake University etc)| [[pdf]](https://arxiv.org/pdf/2404.00242.pdf) | ⚠️ |⭐️⭐️ |
|2024.04|🔥🔥[DeFT] DeFT: Decoding with Flash Tree-Attention for Efficient Tree-structured LLM Inference(@Westlake University etc)| [[pdf]](https://arxiv.org/pdf/2404.00242) | ⚠️ |⭐️⭐️ |

### 📖KV Cache Scheduling/Quantize/Dropping ([©️back👆🏻](#paperlist))
<div id="KV-Cache-Scheduling-Quantize-Dropping"></div>
Expand Down

0 comments on commit 49ade18

Please sign in to comment.