From 71977627283fda306696e39738fa9ee93de37451 Mon Sep 17 00:00:00 2001 From: DefTruth <31974251+DefTruth@users.noreply.github.com> Date: Wed, 5 Jun 2024 09:45:17 +0800 Subject: [PATCH] =?UTF-8?q?=F0=9F=94=A5[I-LLM]=20I-LLM:=20Efficient=20Inte?= =?UTF-8?q?ger-Only=20Inference=20for=20Fully-Quantized=20Low-Bit=20Large?= =?UTF-8?q?=20Language=20Models(@Houmo=20AI)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c31f43b..e0c3836 100644 --- a/README.md +++ b/README.md @@ -147,6 +147,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with |2024.01|[FP6-LLM] FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design(@Microsoft etc)|[[pdf]](https://arxiv.org/pdf/2401.14112.pdf)|⚠️|⭐️ | |2024.05|🔥🔥[**W4A8KV4**] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving(@MIT&NVIDIA)|[[pdf]](https://arxiv.org/pdf/2405.04532)|[[qserve]](https://github.com/mit-han-lab/qserve) ![](https://img.shields.io/github/stars/mit-han-lab/qserve.svg?style=social) |⭐️⭐️ | |2024.05|🔥[SpinQuant] SpinQuant: LLM Quantization with Learned Rotations(@Meta)|[[pdf]](https://arxiv.org/pdf/2405.16406)|⚠️|⭐️ | +|2024.05|🔥[I-LLM] I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models(@Houmo AI)|[[pdf]](https://arxiv.org/pdf/2405.17849)|⚠️|⭐️ | ### 📖IO/FLOPs-Aware/Sparse Attention ([©️back👆🏻](#paperlist))