Recognise gpt-5.1 model identifier (Fixes #464) by jbbqqf · Pull Request #554 · openai/tiktoken

jbbqqf · 2026-05-23T10:04:58Z

Summary

Fixes #464. encoding_for_model("gpt-5.1") currently raises KeyError
because the prefix table only contains "gpt-5-", and "gpt-5.1" does
not start with "gpt-5-" (the literal hyphen mismatches the dot — this
is the same shape that caused #464 to be filed).

This change adds "gpt-5.1" (exact) and "gpt-5.1-" (versioned
variant) to the o200k_base mappings, alongside the existing "gpt-5" /
"gpt-5-" pair. Per OpenAI's GPT-5.1 migration note
(https://platform.openai.com/docs/guides/latest-model#migrating-from-other-models-to-gpt-5-1)
the model is documented as a drop-in replacement for GPT-5, so the
encoding choice mirrors GPT-5's.

A short code comment in MODEL_PREFIX_TO_ENCODING notes the
ordering rationale (purely cosmetic given startswith semantics, but
makes the intent explicit for the next person editing the table).

Reproduce BEFORE/AFTER yourself (copy-paste)

git clone https://github.com/openai/tiktoken.git /tmp/tt-464 && cd /tmp/tt-464
pip install -e . pytest

# BEFORE — on origin/main, gpt-5.1 raises KeyError
git checkout main
python -c "
import tiktoken
print(tiktoken.encoding_for_model('gpt-5').name)
try:
    print(tiktoken.encoding_for_model('gpt-5.1').name)
except KeyError as e:
    print('FAIL:', e)
"
# Expected: gpt-5 -> o200k_base; gpt-5.1 -> KeyError

# AFTER — on this branch, gpt-5.1 maps to o200k_base
git fetch https://github.com/jbbqqf/tiktoken.git feat/464-gpt-5.1-encoding
git checkout FETCH_HEAD
pip install -e .
python -c "
import tiktoken
for m in ['gpt-5', 'gpt-5.1', 'gpt-5.1-2025-11', 'gpt-5-mini']:
    print(m, '->', tiktoken.encoding_for_model(m).name)
"
# Expected: all four map to o200k_base

What I ran locally

$ pytest tests/test_misc.py::test_encoding_for_model -v
tests/test_misc.py::test_encoding_for_model PASSED

$ pytest
============== 33 passed in 1.80s ==============

Verified the new test fails on origin/main (before the tiktoken/model.py
patch is applied) with KeyError: 'Could not automatically map gpt-5.1 to a tokeniser', and passes after the patch.

Edge cases

Input	Behavior before	Behavior after
`"gpt-5"`	`o200k_base` (exact match)	unchanged
`"gpt-5-mini"`	`o200k_base` via `gpt-5-` prefix	unchanged
`"gpt-5.1"`	`KeyError`	`o200k_base` (exact match)
`"gpt-5.1-2025-11"`	`KeyError`	`o200k_base` via `gpt-5.1-` prefix
`"gpt-5.2"` (hypothetical future)	`KeyError`	still `KeyError` (intentional — wait for explicit OpenAI doc)
Any non-`gpt-5` model	unchanged	unchanged

The change is additive (two new dict entries plus a comment); existing
mappings are untouched.

PR drafted with assistance from Claude Code (Anthropic). The change
was reviewed manually against tiktoken/model.py and the OpenAI
migration doc linked above. The reproducer block above is the one I
used during development; reviewers can paste it verbatim.

encoding_for_model('gpt-5.1') raised KeyError because 'gpt-5.1' does not start with the 'gpt-5-' prefix (the literal hyphen mismatches the dot). Add 'gpt-5.1' (exact) and 'gpt-5.1-' (versioned variant) to the same o200k_base mapping that the 'gpt-5' entries already use, per OpenAI's GPT-5.1 migration note that the model is a drop-in replacement for GPT-5. Regression test covers both 'gpt-5.1' and 'gpt-5.1-2025-11'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognise gpt-5.1 model identifier (Fixes #464)#554

Recognise gpt-5.1 model identifier (Fixes #464)#554
jbbqqf wants to merge 1 commit into
openai:mainfrom
jbbqqf:feat/464-gpt-5.1-encoding

jbbqqf commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jbbqqf commented May 23, 2026

Summary

Reproduce BEFORE/AFTER yourself (copy-paste)

What I ran locally

Edge cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant