Bug: Inconsistent error message in tokenizer validation

**Description**
There is a minor inconsistency between the validation logic and the error message for custom token IDs in the '_add_custom_tokens`'method.

**Location**
- File: 'gemma/gm/text/_tokenizer.py'
- Method: '_add_custom_tokens'

**The Problem**
The code correctly validates that the custom token ID 'i' is within the range of [0, 98]

However, if this condition is met, the ValueError that is raised contains an incorrect message:

raise ValueError(
    f'Custom token id {i} for {token!r} is not in [1, 98].'
)
Mismatch between the zero-based indexing used in the validation logic and the one-based counting reflected in the error string.

<img width="769" height="134" alt="Image" src="https://github.com/user-attachments/assets/9ce38866-a7b4-427b-992c-555da527a186" />

I have a fix ready and can open a pull request to resolve this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Inconsistent error message in tokenizer validation #421

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Inconsistent error message in tokenizer validation #421

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions