feat:support 2 kenrels for mixed chunked prefill #2546

chosen-ox · 2024-12-22T17:53:43Z

Motivation

Refer to #2273

Modifications

I implement separate kernels for prefill and decode in flashinfer_backend.py. I run the test python3 -m unittest test_chunked_prefill.TestChunkedPrefill.test_mixed_chunked_prefill. However, the modified version's score is lower than the current one. And I don't see the obvious speedup compared to current version. Can someone figure out if I am on the right track?

Checklist

[√ ] Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

feat:support 2 kenrels for mixed chunked prefill

e413930

chosen-ox requested review from merrymercy, Ying1123, hnyls2002, zhyncs and ispobock as code owners December 22, 2024 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:support 2 kenrels for mixed chunked prefill #2546

feat:support 2 kenrels for mixed chunked prefill #2546

chosen-ox commented Dec 22, 2024

feat:support 2 kenrels for mixed chunked prefill #2546

Are you sure you want to change the base?

feat:support 2 kenrels for mixed chunked prefill #2546

Conversation

chosen-ox commented Dec 22, 2024

Motivation

Modifications

Checklist