From ab989bab9b39ffc3deb12b4ddb5e5611dc492441 Mon Sep 17 00:00:00 2001
From: DefTruth <31974251+DefTruth@users.noreply.github.com>
Date: Mon, 27 May 2024 10:19:00 +0800
Subject: [PATCH] =?UTF-8?q?=F0=9F=94=A5[ZipCache]=20ZipCache:=20Accurate?=
 =?UTF-8?q?=20and=20Efficient=20KV=20Cache=20Quantization=20with=20Salient?=
 =?UTF-8?q?=20Token=20Identification(@Zhejiang=20University=20etc)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 34b5bb3..a2cf889 100644
--- a/README.md
+++ b/README.md
@@ -206,8 +206,9 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
 |2024.04|[SqueezeAttention] SQUEEZEATTENTION: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget(@lzu.edu.cn etc)|[[pdf]](https://arxiv.org/pdf/2404.04793.pdf)|[[SqueezeAttention]](https://github.com/hetailang/SqueezeAttention) ![](https://img.shields.io/github/stars/hetailang/SqueezeAttention.svg?style=social) |⭐️⭐️ |   
 |2024.04|[SnapKV] SnapKV: LLM Knows What You are Looking for Before Generation(@UIUC)|[[pdf]](https://arxiv.org/pdf/2404.14469)|[[SnapKV]](https://github.com/FasterDecoding/SnapKV) ![](https://img.shields.io/github/stars/FasterDecoding/SnapKV.svg?style=social)|⭐️ | 
 |2024.05|🔥[vAttention] vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention(@Microsoft Research India)|[[pdf]](https://arxiv.org/pdf/2405.04437)|⚠️|⭐️⭐️ | 
-|2024.05| [KVCache-1Bit] KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization(@Rice University)|[[pdf]](https://arxiv.org/pdf/2405.03917)|⚠️|⭐️⭐️ | 
-|2024.05| [KV-Runahead] KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation(@Apple etc)|[[pdf]](https://arxiv.org/pdf/2405.05329)|⚠️|⭐️⭐️ | 
+|2024.05|🔥[KVCache-1Bit] KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization(@Rice University)|[[pdf]](https://arxiv.org/pdf/2405.03917)|⚠️|⭐️⭐️ | 
+|2024.05|🔥[KV-Runahead] KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation(@Apple etc)|[[pdf]](https://arxiv.org/pdf/2405.05329)|⚠️|⭐️⭐️ | 
+|2024.05|🔥[ZipCache] ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification(@Zhejiang University etc)|[[pdf]](https://arxiv.org/pdf/2405.14256)|⚠️|⭐️⭐️ |
 
 
 ### 📖Prompt/Context Compression ([©️back👆🏻](#paperlist))