Skip to content

Commit

Permalink
[Tensor Parallel] TP-AWARE DEQUANTIZATION(@IBM T.J. Watson Research C…
Browse files Browse the repository at this point in the history
…enter)
  • Loading branch information
DefTruth authored Feb 17, 2024
1 parent f5d7460 commit 0e8cfd4
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,8 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
* [Mixture-of-Experts(MoE) LLM Inference](#Mixture_of_Experts_LLM_Inference)
* [CPU/Single GPU/Mobile Inference](#CPU-Single-GPU-Inference)
* [Non Transformer Architecture](#Non-Transformer-Architecture)
* [GEMMTensor CoresWMMAParallel](#GEMM-Tensor-Cores-WMMA)
* [Position EmbedOthers](#Others)
* [GEMM/Tensor Cores/WMMA/Parallel](#GEMM-Tensor-Cores-WMMA)
* [Position Embed/Others](#Others)


### 📖LLM Algorithmic/Eval Survey ([©️back👆🏻](#paperlist))
Expand Down Expand Up @@ -249,7 +249,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2023.05|🔥🔥[**RWKV**] RWKV: Reinventing RNNs for the Transformer Era(@Bo Peng etc) |[[pdf]](https://arxiv.org/pdf/2305.13048.pdf)|[[RWKV-LM]](https://github.com/BlinkDL/RWKV-LM) ![](https://img.shields.io/github/stars/BlinkDL/RWKV-LM.svg?style=social)|⭐️⭐️ |
|2023.12|🔥🔥[**Mamba**] Mamba: Linear-Time Sequence Modeling with Selective State Spaces(@cs.cmu.edu etc) |[[pdf]](https://arxiv.org/pdf/2312.00752.pdf)|[[mamba]](https://github.com/state-spaces/mamba) ![](https://img.shields.io/github/stars/state-spaces/mamba.svg?style=social)|⭐️⭐️ |

### 📖GEMMTensor CoresWMMAParallel ([©️back👆🏻](#paperlist))
### 📖GEMM/Tensor Cores/WMMA/Parallel ([©️back👆🏻](#paperlist))
<div id="GEMM-Tensor-Cores-WMMA"></div>

|Date|Title|Paper|Code|Recom|
Expand All @@ -260,7 +260,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2024.02|[QUICK] QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference(@SqueezeBits Inc)|[[pdf]](https://arxiv.org/pdf/2402.10076.pdf)|[[QUICK]](https://github.com/SqueezeBits/QUICK) ![](https://img.shields.io/github/stars/SqueezeBits/QUICK.svg?style=social)|⭐️⭐️ |
|2024.02|[Tensor Parallel] TP-AWARE DEQUANTIZATION(@IBM T.J. Watson Research Center)|[[pdf]](https://arxiv.org/pdf/2402.04925.pdf)|⚠️|⭐️ |

### 📖Position EmbedOthers ([©️back👆🏻](#paperlist))
### 📖Position Embed/Others ([©️back👆🏻](#paperlist))
<div id="Others"></div>

|Date|Title|Paper|Code|Recom|
Expand Down

0 comments on commit 0e8cfd4

Please sign in to comment.