Skip to content

[Feature] In Sglang ,Is chunked-prefill use fused(prefill+decode) batch? #1163

Closed Answered by merrymercy
CSEEduanyu asked this question in Q&A
Discussion options

You must be logged in to vote

By default, it does not mix prefill and decode. However, you can turn on this flag to mix/fuse them

parser.add_argument(
"--enable-mixed-chunk",
action="store_true",
help="Enabling mixing prefill and decode in a chunked batch.",
)
.
It can help reduce the inter-token latency as described in that paper.

We pick chunk size 8192 to favor throughput.

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@CSEEduanyu
Comment options

@Desmond819
Comment options

@merrymercy
Comment options

Answer selected by merrymercy
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants
Converted from issue

This discussion was converted from issue #1162 on August 20, 2024 14:16.