Skip to content

perf: speed up batch tokenization and decoding hot paths#561

Open
eonr wants to merge 10 commits into
openai:mainfrom
eonr:codex/native-batch-tokenization
Open

perf: speed up batch tokenization and decoding hot paths#561
eonr wants to merge 10 commits into
openai:mainfrom
eonr:codex/native-batch-tokenization

Conversation

@eonr
Copy link
Copy Markdown

@eonr eonr commented May 27, 2026

Summary

This PR moves the expensive Python coordination paths for common LLM tokenization workloads into native Rust-backed paths:

  • native batch encoding for short-string-heavy encode_batch and encode_ordinary_batch
  • native batch decoding for decode_batch and decode_bytes_batch
  • native decode_tokens_bytes and decode_with_offsets
  • faster special-token guard paths by avoiding regex construction when the shared special-token prefix is absent
  • faster public encoding construction by loading BPE files directly into CoreBPE and lazily materializing public mergeable_ranks

The main pattern is the same across the changes: keep the public Python API and fallback behavior intact, but avoid per-item Python futures, repeated dict materialization, and repeated regex construction in hot paths.

Benchmarks

Local macOS arm64 workstation, Python 3.14.2. Each row compares current main against this PR using best-of-7 runs unless noted.

Workload API main this PR speedup
10k tiny strings encode_batch 393.461 ms 19.479 ms 20.20x
10k tiny strings encode_ordinary_batch 177.743 ms 18.816 ms 9.45x
10k chat messages encode_batch 266.530 ms 72.159 ms 3.69x
5k tool-call JSON snippets encode_batch 211.567 ms 47.997 ms 4.41x
10k tiny strings decode_batch 129.656 ms 2.569 ms 50.48x
10k tiny strings decode_bytes_batch 136.411 ms 1.816 ms 75.10x
10k chat messages decode_batch 123.914 ms 5.117 ms 24.22x
5k tool-call JSON snippets decode_bytes_batch 73.325 ms 1.786 ms 41.04x
5k chat transcript decode_with_offsets 100.144 ms 4.091 ms 24.48x
long document decode_with_offsets 225.064 ms 9.534 ms 23.61x
10k tiny strings, many special tokens encode_batch 372.618 ms 11.849 ms 31.45x

Cold public encoding construction also improves:

Encoding main get_encoding this PR get_encoding speedup
cl100k_base 73.285 ms 24.813 ms 2.95x
o200k_base 132.115 ms 31.502 ms 4.19x
o200k_harmony 138.450 ms 31.258 ms 4.43x

Benchmark commands:

python scripts/benchmark_batch_encoding.py --reps 7 --warmups 2 --json-output /tmp/batch-encoding.json
python scripts/benchmark_batch_decoding.py --reps 7 --warmups 2 --json-output /tmp/batch-decoding.json
python scripts/benchmark_token_decoding.py --reps 7 --warmups 2 --json-output /tmp/token-decoding.json
python scripts/benchmark_special_encoding.py --single-reps 1000 --batch-reps 7 --warmups 3 --json-output /tmp/special-encoding.json

Validation

python -m py_compile scripts/benchmark_batch_encoding.py scripts/benchmark_batch_decoding.py scripts/benchmark_special_encoding.py scripts/benchmark_token_decoding.py
cargo fmt --check
git diff --check
cargo test -q
TIKTOKEN_MAX_EXAMPLES=1000 python -m pytest tests --import-mode=append -q
check-manifest -v
python -m build --sdist --wheel

Local results:

  • Python tests: 53 passed
  • Rust tests: 2 passed
  • format/diff/benchmark script compile checks: passed
  • check-manifest: version-control and sdist file lists match
  • python -m build --sdist --wheel: built tiktoken-0.13.0.tar.gz and a local macOS arm64 wheel
  • fresh-venv wheel smoke test: import from site-packages, get_encoding, encode_batch, decode_batch, and decode_with_offsets passed

Compatibility notes

The native paths are guarded and fall back to the existing per-string Python behavior for unsupported iterables, surrogate-containing inputs, or unexpected type-conversion failures. Public Encoding._mergeable_ranks still behaves like a dict when accessed; public encodings just defer materializing that dict until a caller actually needs it.

@eonr eonr changed the title Add native batch encoding fast path perf: speed up batch tokenization and decoding hot paths May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant