Performance fix MistralTokenizer: cache special ids and tokens #27925

juliendenize · 2025-11-01T20:05:17Z

Purpose

The PR #26358 to refactor the MistralTokenizer brought a performance issue due to the lack of caching of special tokens: https://github.com/vllm-project/vllm/pull/26358/files#diff-7b4368a1d7d34fa1071838db6d63fea4911f32e60ac065829979b327ad539649R410

The problem of performance makes it about 1000 times slower than expected.

Test Plan

from vllm.transformers_utils.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.from_pretrained("mistralai/Mistral-Small-3.2-24B-Instruct-2506")

# timeit -n 100 -r 10
tokenizer.convert_tokens_to_string([
    "<s>",
    "yolo"
] * 10)

Test Result

from main branch:

22.7 ms ± 113 μs per loop (mean ± std. dev. of 10 runs, 100 loops each)

to this PR:

2.29 μs ± 54.2 ns per loop (mean ± std. dev. of 10 runs, 100 loops each)

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Denize <[email protected]>

gemini-code-assist

Code Review

This pull request effectively addresses a significant performance issue in the MistralTokenizer by caching special tokens and their corresponding IDs during initialization. The changes correctly replace expensive list lookups with fast set lookups in performance-critical methods. I've included a couple of suggestions to improve code maintainability by making an implicit dependency explicit, which will make the code more robust against future refactoring.

vllm/transformers_utils/tokenizers/mistral.py

gemini-code-assist

Code Review

This pull request effectively addresses a performance bottleneck in MistralTokenizer by caching special tokens and their IDs. The changes are well-structured and the use of sets for lookups is a good optimization. My review includes one high-severity comment regarding a potential correctness issue due to inconsistent logic for identifying special tokens in tekken tokenizers. While the inconsistency existed before, this PR touches related code and provides a good opportunity to fix it to ensure robust tokenization.

vllm/transformers_utils/tokenizers/mistral.py

Signed-off-by: Julien Denize <[email protected]>

…project#27925) Signed-off-by: Julien Denize <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

juliendenize added 2 commits November 1, 2025 20:10

Cache special tokens MistralTokenizer

19fc321

Signed-off-by: Julien Denize <[email protected]>

wip

8356b9e

Signed-off-by: Julien Denize <[email protected]>

juliendenize requested a review from patrickvonplaten as a code owner November 1, 2025 20:05

gemini-code-assist bot reviewed Nov 1, 2025

View reviewed changes

vllm/transformers_utils/tokenizers/mistral.py Outdated Show resolved Hide resolved

vllm/transformers_utils/tokenizers/mistral.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 1, 2025

View reviewed changes

vllm/transformers_utils/tokenizers/mistral.py Show resolved Hide resolved

Follow comment

58cc3d6

Signed-off-by: Julien Denize <[email protected]>

juliendenize force-pushed the cache_mistral_tokenizer branch from 4b3c45f to 58cc3d6 Compare November 1, 2025 20:26

patrickvonplaten approved these changes Nov 1, 2025

View reviewed changes

Merge branch 'main' into cache_mistral_tokenizer

1d330a7

DarkLight1337 enabled auto-merge (squash) November 2, 2025 06:46

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 2, 2025

DarkLight1337 merged commit 73444b7 into vllm-project:main Nov 2, 2025
46 checks passed

zhaozuy pushed a commit to zhaozuy/vllm that referenced this pull request Nov 4, 2025

Performance fix MistralTokenizer: cache special ids and tokens (vllm-…

34dd35e

…project#27925) Signed-off-by: Julien Denize <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance fix MistralTokenizer: cache special ids and tokens #27925

Performance fix MistralTokenizer: cache special ids and tokens #27925

juliendenize commented Nov 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Performance fix MistralTokenizer: cache special ids and tokens #27925

Performance fix MistralTokenizer: cache special ids and tokens #27925

Conversation

juliendenize commented Nov 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juliendenize commented Nov 1, 2025 •

edited by github-actions bot

Loading