Skip to content

Conversation

@juliendenize
Copy link
Contributor

@juliendenize juliendenize commented Nov 1, 2025

Purpose

The PR #26358 to refactor the MistralTokenizer brought a performance issue due to the lack of caching of special tokens: https://github.com/vllm-project/vllm/pull/26358/files#diff-7b4368a1d7d34fa1071838db6d63fea4911f32e60ac065829979b327ad539649R410

The problem of performance makes it about 1000 times slower than expected.

Test Plan

from vllm.transformers_utils.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.from_pretrained("mistralai/Mistral-Small-3.2-24B-Instruct-2506")

# timeit -n 100 -r 10
tokenizer.convert_tokens_to_string([
    "<s>",
    "yolo"
] * 10)

Test Result

from main branch:

22.7 ms ± 113 μs per loop (mean ± std. dev. of 10 runs, 100 loops each)

to this PR:

2.29 μs ± 54.2 ns per loop (mean ± std. dev. of 10 runs, 100 loops each)

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Denize <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a significant performance issue in the MistralTokenizer by caching special tokens and their corresponding IDs during initialization. The changes correctly replace expensive list lookups with fast set lookups in performance-critical methods. I've included a couple of suggestions to improve code maintainability by making an implicit dependency explicit, which will make the code more robust against future refactoring.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a performance bottleneck in MistralTokenizer by caching special tokens and their IDs. The changes are well-structured and the use of sets for lookups is a good optimization. My review includes one high-severity comment regarding a potential correctness issue due to inconsistent logic for identifying special tokens in tekken tokenizers. While the inconsistency existed before, this PR touches related code and provides a good opportunity to fix it to ensure robust tokenization.

Signed-off-by: Julien Denize <[email protected]>
@juliendenize juliendenize force-pushed the cache_mistral_tokenizer branch from 4b3c45f to 58cc3d6 Compare November 1, 2025 20:26
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 2, 2025 06:46
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 2, 2025
@DarkLight1337 DarkLight1337 merged commit 73444b7 into vllm-project:main Nov 2, 2025
46 checks passed
zhaozuy pushed a commit to zhaozuy/vllm that referenced this pull request Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants