You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: colossalai/inference/README.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@
18
18
19
19
20
20
## 📌 Introduction
21
-
ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness. [[blog]](https://hpc-ai.com/blog/colossal-inference)
21
+
ColossalAI-Inference is a module which offers acceleration to the inference execution of Transformers models, especially LLMs and DiT Diffusion Models. In ColossalAI-Inference, we leverage high-performance kernels, KV cache, paged attention, continous batching and other techniques to accelerate the inference of LLMs. We also provide simple and unified APIs for the sake of user-friendliness. [[blog]](https://hpc-ai.com/blog/colossal-inference)
Copy file name to clipboardExpand all lines: colossalai/inference/config.py
+16Lines changed: 16 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -186,6 +186,7 @@ class InferenceConfig(RPC_PARAM):
186
186
enable_streamingllm(bool): Whether to use StreamingLLM, the relevant algorithms refer to the paper at https://arxiv.org/pdf/2309.17453 for implementation.
187
187
start_token_size(int): The size of the start tokens, when using StreamingLLM.
188
188
generated_token_size(int): The size of the generated tokens, When using StreamingLLM.
189
+
patched_parallelism_size(int): Patched Parallelism Size, When using Distrifusion
189
190
"""
190
191
191
192
# NOTE: arrange configs according to their importance and frequency of usage
@@ -245,6 +246,11 @@ class InferenceConfig(RPC_PARAM):
245
246
start_token_size: int=4
246
247
generated_token_size: int=512
247
248
249
+
# Acceleration for Diffusion Model(PipeFusion or Distrifusion)
0 commit comments