From 8649c255527bf99c5883def4330ae519d920f552 Mon Sep 17 00:00:00 2001
From: Mr-Philo <1347549342@qq.com>
Date: Mon, 8 Apr 2024 01:54:04 +0000
Subject: [PATCH 1/3] Add github link for paper FP8-Quantization[2202.08]

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index aef412c..d769704 100644
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
 |Date|Title|Paper|Code|Recom|
 |:---:|:---:|:---:|:---:|:---:| 
 |2022.06|🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers(@Microsoft) |[[pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[DeepSpeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social)|⭐️⭐️ |     
-|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | ⚠️ |⭐️ |    
+|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | [FP8-quantization](https://github.com/Qualcomm-AI-research/FP8-quantization) ![](https://img.shields.io/github/stars/Qualcomm-AI-research/FP8-quantization.svg?style=social) |⭐️ |    
 |2022.08|[LLM.int8()] 8-bit Matrix Multiplication  for Transformers at Scale(@Facebook AI Research etc) |[[pdf]](https://arxiv.org/pdf/2208.07339.pdf)|[[bitsandbytes]](https://github.com/timdettmers/bitsandbytes) ![](https://img.shields.io/github/stars/timdettmers/bitsandbytes.svg?style=social)|⭐️ |    
 |2022.10|🔥[**GPTQ**] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS(@IST Austria etc) |[[pdf]](https://arxiv.org/pdf/2210.17323.pdf) |[[gptq]](https://github.com/IST-DASLab/gptq) ![](https://img.shields.io/github/stars/IST-DASLab/gptq.svg?style=social)|⭐️⭐️ |   
 |2022.11|🔥[**WINT8/4**] Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production(@NVIDIA&Microsoft) |[[pdf]](https://arxiv.org/pdf/2211.10017.pdf)|[[FasterTransformer]](https://github.com/NVIDIA/FasterTransformer) ![](https://img.shields.io/github/stars/NVIDIA/FasterTransformer.svg?style=social)|⭐️⭐️ |     

From 7035452d485d873625fcba81a0734f56378ee929 Mon Sep 17 00:00:00 2001
From: Mr-Philo <1347549342@qq.com>
Date: Mon, 8 Apr 2024 01:59:39 +0000
Subject: [PATCH 2/3] typo

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index d769704..aee9359 100644
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
 |Date|Title|Paper|Code|Recom|
 |:---:|:---:|:---:|:---:|:---:| 
 |2022.06|🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers(@Microsoft) |[[pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[DeepSpeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social)|⭐️⭐️ |     
-|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | [FP8-quantization](https://github.com/Qualcomm-AI-research/FP8-quantization) ![](https://img.shields.io/github/stars/Qualcomm-AI-research/FP8-quantization.svg?style=social) |⭐️ |    
+|2022.08|[[FP8-Quantization]] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | [FP8-quantization](https://github.com/Qualcomm-AI-research/FP8-quantization) ![](https://img.shields.io/github/stars/Qualcomm-AI-research/FP8-quantization.svg?style=social) |⭐️ |    
 |2022.08|[LLM.int8()] 8-bit Matrix Multiplication  for Transformers at Scale(@Facebook AI Research etc) |[[pdf]](https://arxiv.org/pdf/2208.07339.pdf)|[[bitsandbytes]](https://github.com/timdettmers/bitsandbytes) ![](https://img.shields.io/github/stars/timdettmers/bitsandbytes.svg?style=social)|⭐️ |    
 |2022.10|🔥[**GPTQ**] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS(@IST Austria etc) |[[pdf]](https://arxiv.org/pdf/2210.17323.pdf) |[[gptq]](https://github.com/IST-DASLab/gptq) ![](https://img.shields.io/github/stars/IST-DASLab/gptq.svg?style=social)|⭐️⭐️ |   
 |2022.11|🔥[**WINT8/4**] Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production(@NVIDIA&Microsoft) |[[pdf]](https://arxiv.org/pdf/2211.10017.pdf)|[[FasterTransformer]](https://github.com/NVIDIA/FasterTransformer) ![](https://img.shields.io/github/stars/NVIDIA/FasterTransformer.svg?style=social)|⭐️⭐️ |     

From c59ccbdffb1f36e10bfb38aa9b32af1076370906 Mon Sep 17 00:00:00 2001
From: Mr-Philo <1347549342@qq.com>
Date: Mon, 8 Apr 2024 02:00:59 +0000
Subject: [PATCH 3/3] typo

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index aee9359..29dc502 100644
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
 |Date|Title|Paper|Code|Recom|
 |:---:|:---:|:---:|:---:|:---:| 
 |2022.06|🔥[**ZeroQuant**] Efficient and Affordable Post-Training Quantization for Large-Scale Transformers(@Microsoft) |[[pdf]](https://arxiv.org/pdf/2206.01861.pdf)|[[DeepSpeed]](https://github.com/microsoft/DeepSpeed) ![](https://img.shields.io/github/stars/microsoft/DeepSpeed.svg?style=social)|⭐️⭐️ |     
-|2022.08|[[FP8-Quantization]] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | [FP8-quantization](https://github.com/Qualcomm-AI-research/FP8-quantization) ![](https://img.shields.io/github/stars/Qualcomm-AI-research/FP8-quantization.svg?style=social) |⭐️ |    
+|2022.08|[FP8-Quantization] FP8 Quantization: The Power of the Exponent(@Qualcomm AI Research) | [[pdf]](https://arxiv.org/pdf/2208.09225.pdf) | [[FP8-quantization]](https://github.com/Qualcomm-AI-research/FP8-quantization) ![](https://img.shields.io/github/stars/Qualcomm-AI-research/FP8-quantization.svg?style=social) |⭐️ |    
 |2022.08|[LLM.int8()] 8-bit Matrix Multiplication  for Transformers at Scale(@Facebook AI Research etc) |[[pdf]](https://arxiv.org/pdf/2208.07339.pdf)|[[bitsandbytes]](https://github.com/timdettmers/bitsandbytes) ![](https://img.shields.io/github/stars/timdettmers/bitsandbytes.svg?style=social)|⭐️ |    
 |2022.10|🔥[**GPTQ**] GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS(@IST Austria etc) |[[pdf]](https://arxiv.org/pdf/2210.17323.pdf) |[[gptq]](https://github.com/IST-DASLab/gptq) ![](https://img.shields.io/github/stars/IST-DASLab/gptq.svg?style=social)|⭐️⭐️ |   
 |2022.11|🔥[**WINT8/4**] Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production(@NVIDIA&Microsoft) |[[pdf]](https://arxiv.org/pdf/2211.10017.pdf)|[[FasterTransformer]](https://github.com/NVIDIA/FasterTransformer) ![](https://img.shields.io/github/stars/NVIDIA/FasterTransformer.svg?style=social)|⭐️⭐️ |