perf: speed up batch tokenization and decoding hot paths by eonr · Pull Request #561 · openai/tiktoken

eonr · 2026-05-27T15:23:44Z

Summary

This PR moves the expensive Python coordination paths for common LLM tokenization workloads into native Rust-backed paths:

native batch encoding for short-string-heavy encode_batch and encode_ordinary_batch
native batch decoding for decode_batch and decode_bytes_batch
native decode_tokens_bytes and decode_with_offsets
faster special-token guard paths by avoiding regex construction when the shared special-token prefix is absent
faster public encoding construction by loading BPE files directly into CoreBPE and lazily materializing public mergeable_ranks

The main pattern is the same across the changes: keep the public Python API and fallback behavior intact, but avoid per-item Python futures, repeated dict materialization, and repeated regex construction in hot paths.

Benchmarks

Local macOS arm64 workstation, Python 3.14.2. Each row compares current main against this PR using best-of-7 runs unless noted.

Workload	API	main	this PR	speedup
10k tiny strings	`encode_batch`	393.461 ms	19.479 ms	20.20x
10k tiny strings	`encode_ordinary_batch`	177.743 ms	18.816 ms	9.45x
10k chat messages	`encode_batch`	266.530 ms	72.159 ms	3.69x
5k tool-call JSON snippets	`encode_batch`	211.567 ms	47.997 ms	4.41x
10k tiny strings	`decode_batch`	129.656 ms	2.569 ms	50.48x
10k tiny strings	`decode_bytes_batch`	136.411 ms	1.816 ms	75.10x
10k chat messages	`decode_batch`	123.914 ms	5.117 ms	24.22x
5k tool-call JSON snippets	`decode_bytes_batch`	73.325 ms	1.786 ms	41.04x
5k chat transcript	`decode_with_offsets`	100.144 ms	4.091 ms	24.48x
long document	`decode_with_offsets`	225.064 ms	9.534 ms	23.61x
10k tiny strings, many special tokens	`encode_batch`	372.618 ms	11.849 ms	31.45x

Cold public encoding construction also improves:

Encoding	main `get_encoding`	this PR `get_encoding`	speedup
`cl100k_base`	73.285 ms	24.813 ms	2.95x
`o200k_base`	132.115 ms	31.502 ms	4.19x
`o200k_harmony`	138.450 ms	31.258 ms	4.43x

Benchmark commands:

python scripts/benchmark_batch_encoding.py --reps 7 --warmups 2 --json-output /tmp/batch-encoding.json
python scripts/benchmark_batch_decoding.py --reps 7 --warmups 2 --json-output /tmp/batch-decoding.json
python scripts/benchmark_token_decoding.py --reps 7 --warmups 2 --json-output /tmp/token-decoding.json
python scripts/benchmark_special_encoding.py --single-reps 1000 --batch-reps 7 --warmups 3 --json-output /tmp/special-encoding.json

Validation

python -m py_compile scripts/benchmark_batch_encoding.py scripts/benchmark_batch_decoding.py scripts/benchmark_special_encoding.py scripts/benchmark_token_decoding.py
cargo fmt --check
git diff --check
cargo test -q
TIKTOKEN_MAX_EXAMPLES=1000 python -m pytest tests --import-mode=append -q
check-manifest -v
python -m build --sdist --wheel

Local results:

Python tests: 53 passed
Rust tests: 2 passed
format/diff/benchmark script compile checks: passed
check-manifest: version-control and sdist file lists match
python -m build --sdist --wheel: built tiktoken-0.13.0.tar.gz and a local macOS arm64 wheel
fresh-venv wheel smoke test: import from site-packages, get_encoding, encode_batch, decode_batch, and decode_with_offsets passed

Compatibility notes

The native paths are guarded and fall back to the existing per-string Python behavior for unsupported iterables, surrogate-containing inputs, or unexpected type-conversion failures. Public Encoding._mergeable_ranks still behaves like a dict when accessed; public encodings just defer materializing that dict until a caller actually needs it.

eonr added 10 commits May 27, 2026 20:52

Add native batch encoding fast path

c310455

Add native batch decoding fast path

95336f0

Add native offset decoding fast path

6673e38

Speed up encoding construction

91c6c96

Speed up default special-token encoding

71312ca

Speed up public encoding construction

7d62b08

Lazily materialize public mergeable ranks

68611c3

Fix arbitrary-text roundtrip property test

9b1c3a6

Speed up unstable encoding prefixes

d34f256

Defer special regex construction

6e4a6d7

eonr changed the title ~~Add native batch encoding fast path~~ perf: speed up batch tokenization and decoding hot paths May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up batch tokenization and decoding hot paths#561

perf: speed up batch tokenization and decoding hot paths#561
eonr wants to merge 10 commits into
openai:mainfrom
eonr:codex/native-batch-tokenization

eonr commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eonr commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmarks

Validation

Compatibility notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eonr commented May 27, 2026 •

edited

Loading