Description
There is a minor inconsistency between the validation logic and the error message for custom token IDs in the '_add_custom_tokens`'method.
Location
- File: 'gemma/gm/text/_tokenizer.py'
- Method: '_add_custom_tokens'
The Problem
The code correctly validates that the custom token ID 'i' is within the range of [0, 98]
However, if this condition is met, the ValueError that is raised contains an incorrect message:
raise ValueError(
f'Custom token id {i} for {token!r} is not in [1, 98].'
)
Mismatch between the zero-based indexing used in the validation logic and the one-based counting reflected in the error string.
I have a fix ready and can open a pull request to resolve this.