-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Performance fix MistralTokenizer: cache special ids and tokens #27925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance fix MistralTokenizer: cache special ids and tokens #27925
Conversation
Signed-off-by: Julien Denize <[email protected]>
Signed-off-by: Julien Denize <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a significant performance issue in the MistralTokenizer by caching special tokens and their corresponding IDs during initialization. The changes correctly replace expensive list lookups with fast set lookups in performance-critical methods. I've included a couple of suggestions to improve code maintainability by making an implicit dependency explicit, which will make the code more robust against future refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a performance bottleneck in MistralTokenizer by caching special tokens and their IDs. The changes are well-structured and the use of sets for lookups is a good optimization. My review includes one high-severity comment regarding a potential correctness issue due to inconsistent logic for identifying special tokens in tekken tokenizers. While the inconsistency existed before, this PR touches related code and provides a good opportunity to fix it to ensure robust tokenization.
Signed-off-by: Julien Denize <[email protected]>
4b3c45f to
58cc3d6
Compare
…project#27925) Signed-off-by: Julien Denize <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>
Purpose
The PR #26358 to refactor the MistralTokenizer brought a performance issue due to the lack of caching of special tokens: https://github.com/vllm-project/vllm/pull/26358/files#diff-7b4368a1d7d34fa1071838db6d63fea4911f32e60ac065829979b327ad539649R410
The problem of performance makes it about 1000 times slower than expected.
Test Plan
Test Result
from main branch:
to this PR:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.