From 0e8cfd47245279ee16ca2b38ac8892039193415e Mon Sep 17 00:00:00 2001 From: DefTruth <31974251+DefTruth@users.noreply.github.com> Date: Sat, 17 Feb 2024 21:14:40 +0800 Subject: [PATCH] [Tensor Parallel] TP-AWARE DEQUANTIZATION(@IBM T.J. Watson Research Center) --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 2290752..caeb168 100644 --- a/README.md +++ b/README.md @@ -59,8 +59,8 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with * [Mixture-of-Experts(MoE) LLM Inference](#Mixture_of_Experts_LLM_Inference) * [CPU/Single GPU/Mobile Inference](#CPU-Single-GPU-Inference) * [Non Transformer Architecture](#Non-Transformer-Architecture) -* [GEMM、Tensor Cores、WMMA、Parallel](#GEMM-Tensor-Cores-WMMA) -* [Position Embed、Others](#Others) +* [GEMM/Tensor Cores/WMMA/Parallel](#GEMM-Tensor-Cores-WMMA) +* [Position Embed/Others](#Others) ### 📖LLM Algorithmic/Eval Survey ([©️back👆🏻](#paperlist)) @@ -249,7 +249,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with |2023.05|🔥🔥[**RWKV**] RWKV: Reinventing RNNs for the Transformer Era(@Bo Peng etc) |[[pdf]](https://arxiv.org/pdf/2305.13048.pdf)|[[RWKV-LM]](https://github.com/BlinkDL/RWKV-LM) ![](https://img.shields.io/github/stars/BlinkDL/RWKV-LM.svg?style=social)|⭐️⭐️ | |2023.12|🔥🔥[**Mamba**] Mamba: Linear-Time Sequence Modeling with Selective State Spaces(@cs.cmu.edu etc) |[[pdf]](https://arxiv.org/pdf/2312.00752.pdf)|[[mamba]](https://github.com/state-spaces/mamba) ![](https://img.shields.io/github/stars/state-spaces/mamba.svg?style=social)|⭐️⭐️ | -### 📖GEMM、Tensor Cores、WMMA、Parallel ([©️back👆🏻](#paperlist)) +### 📖GEMM/Tensor Cores/WMMA/Parallel ([©️back👆🏻](#paperlist))
|Date|Title|Paper|Code|Recom| @@ -260,7 +260,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with |2024.02|[QUICK] QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference(@SqueezeBits Inc)|[[pdf]](https://arxiv.org/pdf/2402.10076.pdf)|[[QUICK]](https://github.com/SqueezeBits/QUICK) ![](https://img.shields.io/github/stars/SqueezeBits/QUICK.svg?style=social)|⭐️⭐️ | |2024.02|[Tensor Parallel] TP-AWARE DEQUANTIZATION(@IBM T.J. Watson Research Center)|[[pdf]](https://arxiv.org/pdf/2402.04925.pdf)|⚠️|⭐️ | -### 📖Position Embed、Others ([©️back👆🏻](#paperlist)) +### 📖Position Embed/Others ([©️back👆🏻](#paperlist))
|Date|Title|Paper|Code|Recom|