Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:support 2 kenrels for mixed chunked prefill #2546

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chosen-ox
Copy link

Motivation

Refer to #2273

Modifications

I implement separate kernels for prefill and decode in flashinfer_backend.py. I run the test python3 -m unittest test_chunked_prefill.TestChunkedPrefill.test_mixed_chunked_prefill. However, the modified version's score is lower than the current one. And I don't see the obvious speedup compared to current version. Can someone figure out if I am on the right track?

Checklist

  • [√ ] Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant