bench: add tail-latency benchmark for encode_ordinary_batch by alobroke · Pull Request #548 · openai/tiktoken

alobroke · 2026-05-20T21:53:42Z

Motivated by #530, which reported worst-of-10 tail spikes of 1.1×–7.6×
over median on encode_ordinary_batch. The issue author offered to PR a
benchmark harness — this is that harness.

Problem

The existing scripts/benchmark.py measures throughput only (bytes/sec).
Throughput numbers hide tail latency completely — a run that takes 7×
longer than median still averages out fine across many runs.

What this adds

scripts/benchmark_tail_latency.py — a self-contained tail-latency harness that:

Measures median and worst-of-N wall-clock time per corpus
Reports the worst/median ratio so tail spikes are immediately visible
Tests four synthetic corpora generated at runtime (no data files needed):
- english prose
- python source
- multilingual + emoji
- random ascii
Accepts CLI flags for --runs, --batch-size, --encoding, --threads

Example output

encoding: o200k_base | batch_size: 64 | runs: 10
── num_threads=8 ──────────────────────────────────────
corpus tokens/batch median ms worst ms worst/med
english prose 2,560,000 240 980 4.1x
python source 4,480,000 580 950 1.6x
multilingual+emoji 5,120,000 1020 2100 2.1x
random ascii 7,680,000 680 780 1.1x

Usage

python scripts/benchmark_tail_latency.py
python scripts/benchmark_tail_latency.py --runs 10 --batch-size 256 --encoding o200k_base --threads 1,4,8

Fixes #530 (benchmark harness portion).

Adds scripts/benchmark_tail_latency.py to measure median and worst-of-N wall-clock time for encode_ordinary_batch across multiple synthetic corpora and thread counts. Motivated by issue openai#530, which reported 1.1x-7.6x tail spikes on a 32-core box. The existing benchmark.py only measures throughput; this script surfaces the worst-of-N latency that throughput numbers hide. Features: - Four synthetic corpora (english prose, python source, multilingual+emoji, random ascii) - Configurable runs, batch size, encoding, and thread counts via CLI flags - Outputs a table with median ms, worst ms, and worst/median ratio - No external data files required — corpora are generated at runtime Usage: python scripts/benchmark_tail_latency.py python scripts/benchmark_tail_latency.py --runs 10 --batch-size 256 --encoding o200k_base --threads 1,4,8

alobroke added 2 commits May 21, 2026 03:21

Merge branch 'main' into bench/tail-latency-benchmark

9a77444

alobroke mentioned this pull request May 20, 2026

encode_ordinary_batch — reproducible multi-second tail stalls on 32-core box (o200k_base, num_threads=8) #530

Open

Merge branch 'main' into bench/tail-latency-benchmark

0de77b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add tail-latency benchmark for encode_ordinary_batch#548

bench: add tail-latency benchmark for encode_ordinary_batch#548
alobroke wants to merge 3 commits into
openai:mainfrom
alobroke:bench/tail-latency-benchmark

alobroke commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alobroke commented May 20, 2026

Problem

What this adds

Example output

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant