GitHub - aiptimizer/TurboOCR: Fast GPU OCR server. 270 img/s on FUNSD. TensorRT FP16, PP-OCRv5, HTTP + gRPC.

GPU-accelerated OCR server. 50x faster than PaddleOCR Python.
C++ / CUDA / TensorRT / PP-OCRv5 — Linux + NVIDIA GPU

Quick Start · API · Benchmarks · Monitoring · Configuration · Build · Roadmap · Website

Highlights

🚀 270 img/s on FUNSD A4 forms (c=16) — 1,200+ img/s on sparse images
⚡ 11 ms p50 latency, single request
🎯 F1 = 90.2% on FUNSD — higher accuracy than PaddleOCR Python with the same weights
🖨️ Prints & handwriting — PP-OCRv5 handles both out of the box
📄 PDF native — pages rendered and OCR'd in parallel
🔒 4 PDF modes — pure OCR, native text layer, auto-dispatch, detection-verified hybrid
🧩 Layout detection — PP-DocLayoutV3 with 25 region classes, per-request ?layout=1 toggle
📖 Reading order — class-aware XY-cut (header → body → footer/reference), row-tolerant table-cell sort, orphan-aware placement, opt-in via ?reading_order=1
🌐 HTTP + gRPC from a single binary, sharing the same GPU pipeline pool
🐳 One-line Docker deploy — docker run with auto TRT engine build on first start
📊 Prometheus metrics — request counters, latency histograms, VRAM usage on /metrics
🌐 Configurable languages (Latine e.g., English, French, German, Spanish, Portuguese; Chinese, Greek, Russian, Arabic, Korean, Thai)

RTX 5090, PP-OCRv5 mobile latin, TensorRT FP16, pool=5. Prints, handwriting, layout detection. This is the fast lane.

🗺️ Roadmap

🔍 Structured extraction
📝 Markdown output
📊 Table parsing

Quick Start

Requirements: Linux, NVIDIA driver 595+, Turing or newer GPU (RTX 20-series / GTX 16-series+).

docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  ghcr.io/aiptimizer/turboocr:v2.3.0

First startup builds TensorRT engines from ONNX. This takes about 90 seconds on a 5090 GPU and up to an hour on older ones. Set TRT_OPT_LEVEL=3 to cut build time 3 to 5x with a small speed regression. The volume caches the engines, so subsequent starts are instant. During the build, requests will return a connection refused error from nginx until the backend is ready. nginx (port 8000) reverse-proxies to Drogon (port 8080), and both start automatically.

curl -X POST http://localhost:8000/ocr/raw \
  --data-binary @document.png -H "Content-Type: image/png"

{
  "results": [
    {"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}
  ]
}

API

HTTP on port 8000, gRPC on port 50051 — single binary, shared GPU pipeline pool.

Important: Use persistent connections (HTTP keep-alive). Sending many short-lived connections (e.g. one curl per request in a loop) can overwhelm the server and cause it to stall. All standard HTTP client libraries (requests.Session, aiohttp, Go http.Client, etc.) reuse connections by default.

Endpoints

Endpoint	Input	Description
`/health`	—	Returns `"ok"`
`/health/live`	—	Kubernetes liveness probe
`/health/ready`	—	Readiness probe — verifies GPU pipeline is responsive
`/ocr/raw`	Raw image bytes	Fastest path — PNG, JPEG, etc.
`/ocr`	`{"image": "<base64>"}`	For clients that can only send JSON
`/ocr/batch`	`{"images": ["<b64>", ...]}`	Multiple images in one request
`/ocr/pixels`	Raw BGR bytes + `X-Width` / `X-Height` / `X-Channels` headers	Zero-decode path — see /ocr/pixels
`/ocr/pdf`	Raw bytes, `{"pdf": "<b64>"}`, or `multipart/form-data`	All pages OCR'd in parallel
`/metrics`	—	Prometheus metrics (text exposition format)
gRPC	Raw bytes (protobuf)	Port 50051 — see `proto/ocr.proto`

Query Parameters

Parameter	Endpoints	Values	Default
`layout`	all	`0` / `1`	`0` — include layout regions (~20% throughput cost)
`reading_order`	image routes	`0` / `1`	`0` — emit `reading_order` array indexing `results` in proper reading order (auto-enables `layout=1`). Class-aware: header → body → footer/footnote/reference; XY-cut on body with row-tolerant table-cell sort and orphan placement
`as_blocks`	image + PDF routes	`0` / `1`	`0` — when `1`, response includes a `blocks` array: paragraph-level aggregate, one entry per non-empty layout cell, in reading order. Auto-enables `layout=1` and `reading_order=1`. Each block has `{id, layout_id, class, bounding_box, content, order_index}`. Mirrors PaddleX PP-StructureV3 `parsing_res_list` granularity.
`mode`	`/ocr/pdf`	`ocr` / `geometric` / `auto` / `auto_verified`	`ocr` — on the CPU binary, `auto_verified` is silently aliased to `auto` (no native text re-verifier on CPU). Inspect the per-page `mode` field in the response to see which path actually ran.
`dpi`	`/ocr/pdf`	`50`–`600`	`100` — render resolution

Parameter parsing rules. Parameter names are case-sensitive: ?layout=1 works, ?Layout=1 is silently ignored. Boolean values for layout, reading_order, and as_blocks accept any case of 1/0, true/false, on/off, yes/no, and reject anything else with 400 INVALID_PARAMETER. Values for mode= are case-sensitive and silently fall back to the configured default when unrecognized — ?mode=Auto, ?mode=AUTO, or ?mode=foobar all run as mode=ocr (or whatever ENABLE_PDF_MODE is set to) without error. Always pass exactly ocr, geometric, auto, or auto_verified.

Examples

# Image — raw bytes (fastest)
curl -X POST http://localhost:8000/ocr/raw \
  --data-binary @doc.png -H "Content-Type: image/png"

# Image — base64 JSON
curl -X POST http://localhost:8000/ocr \
  -H "Content-Type: application/json" \
  -d '{"image":"'$(base64 -w0 doc.png)'"}'

# PDF — raw bytes
curl -X POST http://localhost:8000/ocr/pdf \
  --data-binary @document.pdf

# PDF — multipart (works from any client, including browsers)
curl -X POST http://localhost:8000/ocr/pdf \
  -F "file=@document.pdf"

# PDF — with layout + auto mode
curl -X POST "http://localhost:8000/ocr/pdf?layout=1&mode=auto" \
  --data-binary @document.pdf

# gRPC (grpcurl uses base64 for CLI; real clients send raw bytes)
grpcurl -plaintext -d '{"image":"'$(base64 -w0 doc.png)'"}' \
  localhost:50051 ocr.OCRService/Recognize

`/ocr/pixels` (zero-decode path)

For clients that already hold a decoded image in memory (NumPy, OpenCV, custom pipelines), /ocr/pixels skips the PNG/JPEG decode step entirely. The body is sent as raw pixel bytes; dimensions travel in HTTP headers.

Header	Required	Values	Meaning
`X-Width`	yes	`1`–`MAX_IMAGE_DIM` (default `16384`)	Image width in pixels
`X-Height`	yes	`1`–`MAX_IMAGE_DIM` (default `16384`)	Image height in pixels
`X-Channels`	no	`1` or `3` (default `3`)	`3` = BGR (OpenCV order, not RGB), `1` = grayscale

Body: raw pixel bytes, length must equal width * height * channels exactly. A mismatch returns 400 BODY_SIZE_MISMATCH.
Query parameters: the same ?layout= and ?reading_order= as /ocr apply.
Errors: MISSING_HEADER (no X-Width / X-Height), INVALID_HEADER (unparseable values), INVALID_DIMENSIONS (non-positive size or channels other than 1/3), DIMENSIONS_TOO_LARGE (exceeds MAX_IMAGE_DIM).
Use case: the hot path when upstream code already has a decoded cv::Mat / np.ndarray and you don't want to round-trip through PNG.

# Python — send a decoded OpenCV image (BGR)
python -c "
import cv2, requests
img = cv2.imread('doc.png')        # BGR, HxWx3
h, w, c = img.shape
requests.post('http://localhost:8000/ocr/pixels',
              data=img.tobytes(),
              headers={'X-Width': str(w), 'X-Height': str(h), 'X-Channels': str(c)})
"

Response Format

Image endpoints return:

{"results": [{"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}]}

With ?layout=1, a layout array is added. Each OCR result gets a layout_id linking it to the containing layout region:

{
  "results": [{"text": "...", "confidence": 0.97, "id": 0, "layout_id": 2, "bounding_box": [...]}],
  "layout": [{"id": 0, "class": "header", "confidence": 0.91, "bounding_box": [...]},
             {"id": 2, "class": "table", "confidence": 0.95, "bounding_box": [...]}]
}

PDF endpoint wraps results per page:

{
  "pages": [{
    "page": 1, "page_index": 0, "dpi": 100, "width": 1047, "height": 1389,
    "mode": "ocr", "text_layer_quality": "absent", "results": [...]
  }]
}

Coordinate conversion: x_pdf = x_px * 72 / dpi.

Per-page fields:

mode — the resolved mode that actually ran on this page (ocr / geometric / auto_verified). For ?mode=auto requests, each page resolves to either geometric (text layer accepted) or ocr (fell back to OCR), never auto. On the CPU binary, ?mode=auto_verified resolves to auto semantics, so per-page mode will be geometric or ocr — auto_verified only appears on the GPU binary.
text_layer_quality — assessment of the page's native text layer:
- absent — no usable text layer (image-only PDF, fewer than 10 chars, or empty lines)
- rejected — text layer present but failed sanity checks (non-zero rotation, >5% replacement chars, >10% non-printable chars)
- trusted — native text passed sanity checks and was used (geometric / auto) or considered for cross-check (auto_verified)
- For mode=ocr this is always absent (the text-layer pre-pass is skipped entirely).

PDF Extraction Modes

Mode	What it does	Speed
`ocr`	Render + full OCR pipeline	Baseline
`geometric`	PDFium text layer only, no rasterization	~10x faster
`auto`	Per-page: text layer if available, else OCR	Fastest for mixed PDFs
`auto_verified`	Full pipeline + replace with native text where sanity check passes	Slightly slower than OCR

Caution

PDF text-layer trust model. Modes other than ocr read the PDF's native text layer, which the PDF author controls. A malicious PDF can embed invisible text, remap glyphs via ToUnicode, or inject arbitrary strings that differ from what's visually rendered.

When to use each mode:

Scenario	Recommended mode	Why
Untrusted uploads (user-submitted PDFs)	`ocr`	Only trusts pixel data — immune to text-layer manipulation
Internal/trusted documents	`auto` or `geometric`	Safe when you control the PDF source; much faster
High-accuracy with verification	`auto_verified`	OCR runs first, then results are cross-checked against the text layer. Accepts native text only if it passes heuristic validation (character count, non-printable ratio < 10%, replacement char ratio < 5%, no rotation)

Default: mode=ocr (safest). Override per-request via ?mode= query parameter or globally via ENABLE_PDF_MODE env var.

Deployment recommendation: If your service accepts PDFs from untrusted sources, do not set ENABLE_PDF_MODE to geometric or auto globally. Keep the default ocr and only use text-layer modes for trusted internal workflows.

Layout Detection

All endpoints accept ?layout=1 to detect document regions using PP-DocLayoutV3 (25 classes):

abstract · algorithm · aside_text · chart · content · display_formula · doc_title · figure_title · footer · footer_image · footnote · formula_number · header · header_image · image · inline_formula · number · paragraph_title · reference · reference_content · seal · table · text · vertical_text · vision_footnote

Layout classes (reading-order buckets)

When ?reading_order=1 is set, classes are partitioned into three strata before XY-cut runs, so common page furniture lands in the right slot regardless of where the layout model placed it spatially: TOP is read first, then BODY (sorted by XY-cut), then BOTTOM.

Class ID	Name	Bucket
0	`abstract`	BODY
1	`algorithm`	BODY
2	`aside_text`	BODY
3	`chart`	BODY
4	`content`	BODY
5	`display_formula`	BODY
6	`doc_title`	BODY
7	`figure_title`	BODY
8	`footer`	BOTTOM
9	`footer_image`	BOTTOM
10	`footnote`	BOTTOM
11	`formula_number`	BODY
12	`header`	TOP
13	`header_image`	TOP
14	`image`	BODY
15	`inline_formula`	BODY
16	`number`	BODY
17	`paragraph_title`	BODY
18	`reference`	BOTTOM
19	`reference_content`	BOTTOM
20	`seal`	BODY
21	`table`	BODY
22	`text`	BODY
23	`vertical_text`	BODY
24	`vision_footnote`	BOTTOM

Class 16 (number, page numbers) deliberately stays in BODY because page numbers can appear at the top or the bottom of a page — XY-cut places them by geometry. Class IDs are pinned with static_assert against the PaddleX label list, so a future re-shuffle would fail the build rather than silently misroute classes.

_{Layout detection overlay — color-coded regions: paragraph_title, text, chart, figure_title, header, footer, number}

Benchmarks

FUNSD form-understanding dataset (50 pages, ~170 words/page). Same word-level F1 metric for all engines. Single RTX 5090.

Benchmark caveats

Crude accuracy metric. Bag-of-words F1 ignores order and duplicate counts. CER or reading-order metrics would likely help VLM systems.
VLMs could run faster. Served via off-the-shelf vLLM in fp16. Quantization, speculative decoding, or a dedicated stack would push throughput higher.
VLM prompts are untuned. With prompt engineering both VLMs would likely surpass every CTC engine here.
Single domain. FUNSD is English business forms; other document types would look different.

Reproduce: python tests/benchmark/comparison/bench_turbo_ocr.py (requires running server + datasets library).

Configuration

Variable	Default	Description
`OCR_LANG`	(unset = latin)	Language bundle: `latin`, `chinese`, `greek`, `eslav`, `arabic`, `korean`, `thai`. All bundles are baked into the image at build time — no runtime download.
`OCR_SERVER`	(unset)	With `OCR_LANG=chinese`, set to `1` to use the 84 MB PP-OCRv5 server rec instead of the 16 MB mobile rec. Ignored for other languages.
`PIPELINE_POOL_SIZE`	auto	Concurrent GPU pipelines (~1.4 GB each)
`DISABLE_LAYOUT`	`0`	Set to `1` to disable PP-DocLayoutV3 layout detection and save ~300-500 MB VRAM
`ENABLE_PDF_MODE`	`ocr`	Default PDF mode: `ocr` / `geometric` / `auto` / `auto_verified`
`DISABLE_ANGLE_CLS`	`0`	Skip angle classifier (~0.4 ms savings)
`DET_MAX_SIDE`	`960`	Max detection input side (px). Bounds: 32–4096. The TRT engine profile is built to match this value; changing it invalidates the cached engine and triggers a one-time rebuild.
`TRT_OPT_LEVEL`	`5`	TensorRT builder optimization level. Bounds: 0–5. Lower values trade runtime perf for faster cold builds (`3` typically builds ~3-5× faster with <5% runtime regression). The cache key includes the level, so different values produce separate engines.
`TRT_ENGINE_CACHE`	`~/.cache/turbo-ocr`	Directory for cached TensorRT engines. Set to a host-mounted path to share engines across container restarts.
`TURBO_OCR_HOST`	`0.0.0.0`	Bind address for HTTP and gRPC listeners. Default binds every IPv4 interface; use `127.0.0.1` for loopback only, `::` for all interfaces incl. IPv6, or a specific interface IP. Equivalent CLI flag: `--host`.
`PORT` / `GRPC_PORT`	`8080` / `50051`	Server ports. The binary listens on `PORT=8080` by default; the Docker image runs nginx in front of it on port `8000`, so external clients use `8000` and `PORT` only matters for direct/native runs.
`PDF_DAEMONS` / `PDF_WORKERS`	`16` / `4`	PDF render parallelism
`GRPC_BATCH_WORKERS`	`8`	Parallel workers in gRPC `RecognizeBatch` for fan-out across pipeline pool
`HTTP_THREADS`	`pool * 32`	Work pool threads for blocking inference
`MAX_PDF_PAGES`	`2000`	Maximum pages per PDF request
`SHUTDOWN_GRACE_SECONDS`	`30`	Seconds to wait for inflight requests to drain on SIGTERM/SIGINT before tearing down. Set to stay below your orchestrator's SIGKILL grace (K8s default 30s).
`GRPC_CQS`	`10`	Number of gRPC completion queues. Higher values trade memory for connection-handling parallelism on high-fanout deployments.
`GRPC_RESPONSE_MODE`	`json_bytes`	gRPC response format: `json_bytes` (default — full JSON in `json_response` field) or `structured` (typed protobuf fields).
`MAX_BODY_MB`	`100`	Max request body size in MB. Applied at all three layers: nginx (413 at proxy), Drogon HTTP (`setClientMaxBodySize`), and gRPC (`SetMaxReceive/SendMessageSize`). Bounds: 1–102400.
`MAX_BODY_MEMORY_MB`	`min(1024, MAX_BODY_MB)` — effectively `100` with stock config	Per-request in-memory buffer threshold. Bodies up to this size stay in RAM; larger ones spill to a tempfile under `/tmp`. Always clamped to `[1, MAX_BODY_MB]`, so the effective default tracks `MAX_BODY_MB`. Raise `MAX_BODY_MB` first to unlock larger in-memory buffers. Lower on memory-constrained hosts (e.g. `MAX_BODY_MEMORY_MB=50` caps buffer RSS at ~50 MB × concurrent requests).
`MAX_IMAGE_DIM`	`16384`	Max width or height (px) accepted on `/ocr/pixels` and image-decode routes. Bounds: 64–65535.
`LOG_LEVEL`	`info`	Log level: `debug` / `info` / `warn` / `error`
`LOG_FORMAT`	`json`	Log format: `json` (structured) / `text` (human-readable)
`TOCR_LOG_RATELIMIT`	`10`	Max rate-limited logs per call site per 1s window (applies to per-request error paths). `0` disables. Format `N` or `N:W_MS` (e.g. `5:2000` = 5 logs / 2s). On window roll a single `[suppressed logs]` rollup line is emitted.

Every knob above is also exposed as a CLI flag (--http-port, --max-body-mb, --disable-layout, --det-max-side, --log-level, etc.). The two exceptions, which remain env-only because their valid set is context-dependent, are OCR_LANG (validated against installed model bundles at first request) and TOCR_LOG_RATELIMIT (custom N or N:W_MS format). CLI flags override env vars when both are set. Useful flags for inspection:

paddle_highspeed_cpp --help            # full flag listing
paddle_highspeed_cpp --print-config    # resolved JSON config; exit 0
paddle_highspeed_cpp --check-config    # validate only; exit 0 on valid, 2 on errors

Malformed env vars or out-of-range values cause startup to fail with a clear error list — the server refuses to bind rather than silently coerce bad input (e.g. PORT=abc used to become 1; it now exits with [config error] PORT="abc" is not a valid integer). Validate config without booting the pipeline using --check-config.

Layout detection is enabled by default. The model is loaded at startup but only runs when a request includes ?layout=1. Requests without ?layout=1 have zero overhead. Requests with ?layout=1 reduce throughput by ~20%. Set DISABLE_LAYOUT=1 to skip loading the model entirely and save ~300-500 MB VRAM.

Migration note (v2.3+): The legacy ENABLE_LAYOUT env var has been removed. If set, startup fails with a clear error — use DISABLE_LAYOUT=1 to disable layout, or remove the var (layout is on by default).

docker run --gpus all -p 8000:8000 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  -e PIPELINE_POOL_SIZE=3 \
  ghcr.io/aiptimizer/turboocr:v2.3.0

Add MAX_PDF_PAGES (default 2000) to limit the number of pages processed per PDF request. LOG_LEVEL (debug/info/warn/error) and LOG_FORMAT (json/text) control structured logging output.

Monitoring

Prometheus Metrics

Scrape GET /metrics for Prometheus-compatible metrics:

turbo_ocr_requests_total{route="/ocr/raw",status="2xx"} 1042
turbo_ocr_request_duration_seconds_bucket{route="/ocr/raw",le="0.025"} 980
turbo_ocr_request_duration_seconds_sum{route="/ocr/raw"} 12.345
turbo_ocr_request_duration_seconds_count{route="/ocr/raw"} 1042
turbo_ocr_gpu_vram_used_bytes 9052815360
turbo_ocr_gpu_vram_total_bytes 33661911040
turbo_ocr_pipeline_pool_size 5
turbo_ocr_pool_exhaustions_total 0
turbo_ocr_request_bytes_total 49493243
turbo_ocr_request_body_avg_bytes 9407

Response Headers

Every response includes:

Header	Description
`X-Request-Id`	UUID v7 (or propagated from client `X-Request-Id` header)
`X-Inference-Time-Ms`	End-to-end processing time in milliseconds
`Retry-After`	Seconds to wait (only on 503 responses)

Health Endpoints

Endpoint	Description
`GET /health`	Basic liveness check
`GET /health/live`	Kubernetes liveness probe
`GET /health/ready`	Readiness probe — verifies GPU pipeline is responsive

Structured Errors

All error responses return JSON with Content-Type: application/json:

{"error": {"code": "EMPTY_BODY", "message": "Empty body"}}

Error codes: EMPTY_BODY, INVALID_JSON, MISSING_IMAGE, BASE64_DECODE_FAILED, IMAGE_DECODE_FAILED, INVALID_PARAMETER, UNSUPPORTED_PARAMETER, INVALID_DPI, INVALID_DIMENSIONS, DIMENSIONS_TOO_LARGE, BODY_SIZE_MISMATCH, MISSING_HEADER, INVALID_HEADER, EMPTY_BATCH, MISSING_FILE, MISSING_PDF, INVALID_MULTIPART, PDF_RENDER_FAILED, PDF_TOO_LARGE, EMPTY_PDF, SERVER_BUSY, NOT_READY, INFERENCE_ERROR.

Building from Source

Dependency	GPU	CPU
GCC 13.3+ / C++20	x	x
CUDA + TensorRT 10.2+	x
OpenCV 4.x	x	x
Drogon 1.9+	x	x
gRPC + Protobuf	x
ONNX Runtime 1.22+		x

Wuffs, Clipper, PDFium vendored in third_party/.

# Docker (recommended)
docker build -f docker/Dockerfile.gpu -t turboocr .
docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr turboocr

# CPU only (Docker) — ~2-3 img/s, mainly for testing
docker build -f docker/Dockerfile.cpu -t turboocr-cpu .
docker run -p 8000:8000 turboocr-cpu

# Native build — PP-OCRv5 models auto-fetched into ./models/ on first build
cmake -B build -DTENSORRT_DIR=/usr/local/tensorrt
cmake --build build -j$(nproc)
LD_LIBRARY_PATH=/usr/local/tensorrt/lib ./build/paddle_highspeed_cpp

# CPU-only native
cmake -B build_cpu -DUSE_CPU_ONLY=ON
cmake --build build_cpu -j$(nproc)
./build_cpu/paddle_cpu_server

# If your distro's gRPC CMake config conflicts with system protobuf,
# add -DCMAKE_DISABLE_FIND_PACKAGE_gRPC=ON to fall back to pkg-config.
# To skip the model auto-fetch (e.g. in CI), add -DFETCH_MODELS=OFF.

# CUDA SM target. Native builds default to sm_120 (Blackwell, RTX 50-series)
# only — the full multi-arch fat binary is ~12.5 GB and adds 10-15 s of
# PTX-JIT to first-start on cold cache. To target other GPUs, opt back in:
#   cmake -B build -DCMAKE_CUDA_ARCHITECTURES="86;89;120" ...
# Reference: 75=Turing, 80=A100, 86=Ampere consumer, 89=Ada, 90=Hopper,
# 100=Blackwell DC, 120=Blackwell consumer.

Supported Languages

Set via the OCR_LANG environment variable. Every supported language bundle is baked into the image at build time from the pinned PP-OCRv5 GitHub Release (SHA256-verified). No runtime downloads, no network dependency at container start.

`OCR_LANG`	Script / family	Notes
(unset) / `latin`	Latin + basic Greek (English, German, French, Italian, Polish, Czech, …)	836-char dict; what powers the benchmarks above
`chinese`	Simplified + Traditional Chinese	18,385-class mobile rec (16 MB); set `OCR_SERVER=1` for the 84 MB server variant
`greek`	dedicated Greek rec	356-class Greek-specialized rec (7.8 MB) — higher accuracy than Latin's combined dict
`korean`	Hangul + basic Latin	11,947-class rec (13 MB)
`arabic`, `eslav`, `thai`	per-script PP-OCRv5	7-8 MB each

# Chinese
docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  -e OCR_LANG=chinese \
  ghcr.io/aiptimizer/turboocr:v2.3.0

Volume tip: use a named volume (trt-cache:) as shown above, not a host bind-mount. Named volumes auto-populate from the image on first use, so the baked language bundles survive. A bind-mount of an empty host directory would shadow /home/ocr/.cache/turbo-ocr and leave the server with nothing to load.

Run tests/language_smoketest.py to verify any language end-to-end on your hardware (renders a short phrase, OCRs it, checks char-recall against a per-language threshold).

Acknowledgements

This project builds on the work of several open-source projects:

PaddleOCR (Baidu) — PP-OCRv5 detection, recognition, and classification models. PP-DocLayoutV3 layout detection model. This project would not exist without their research and pre-trained weights.
Drogon — high-performance async C++ HTTP framework
Wuffs — fast PNG decoder by Google (vendored)
PDFium — PDF rendering and text extraction (vendored)
Clipper — polygon clipping for text detection post-processing (vendored)

License

MIT. See LICENSE.

_{Main Sponsor: Miruiq — AI-powered data extraction from PDFs and documents.}

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
bin		bin
docker		docker
include/turbo_ocr		include/turbo_ocr
proto		proto
scripts		scripts
src		src
tests		tests
third_party		third_party
tools		tools
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highlights

🗺️ Roadmap

Quick Start

API

Endpoints

Query Parameters

Examples

`/ocr/pixels` (zero-decode path)

Response Format

PDF Extraction Modes

Layout Detection

Layout classes (reading-order buckets)

Benchmarks

Configuration

Monitoring

Prometheus Metrics

Response Headers

Health Endpoints

Structured Errors

Building from Source

Supported Languages

Acknowledgements

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Highlights

🗺️ Roadmap

Quick Start

API

Endpoints

Query Parameters

Examples

/ocr/pixels (zero-decode path)

Response Format

PDF Extraction Modes

Layout Detection

Layout classes (reading-order buckets)

Benchmarks

Configuration

Monitoring

Prometheus Metrics

Response Headers

Health Endpoints

Structured Errors

Building from Source

Supported Languages

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/ocr/pixels` (zero-decode path)

Packages