GPU-accelerated OCR server. 50x faster than PaddleOCR Python.
C++ / CUDA / TensorRT / PP-OCRv5 — Linux + NVIDIA GPU
Quick Start · API · Benchmarks · Monitoring · Configuration · Build · Roadmap · Website
- 🚀 270 img/s on FUNSD A4 forms (c=16) — 1,200+ img/s on sparse images
- ⚡ 11 ms p50 latency, single request
- 🎯 F1 = 90.2% on FUNSD — higher accuracy than PaddleOCR Python with the same weights
- 🖨️ Prints & handwriting — PP-OCRv5 handles both out of the box
- 📄 PDF native — pages rendered and OCR'd in parallel
- 🔒 4 PDF modes — pure OCR, native text layer, auto-dispatch, detection-verified hybrid
- 🧩 Layout detection — PP-DocLayoutV3 with 25 region classes, per-request
?layout=1toggle - 📖 Reading order — class-aware XY-cut (header → body → footer/reference), row-tolerant table-cell sort, orphan-aware placement, opt-in via
?reading_order=1 - 🌐 HTTP + gRPC from a single binary, sharing the same GPU pipeline pool
- 🐳 One-line Docker deploy —
docker runwith auto TRT engine build on first start - 📊 Prometheus metrics — request counters, latency histograms, VRAM usage on
/metrics - 🌐 Configurable languages (Latine e.g., English, French, German, Spanish, Portuguese; Chinese, Greek, Russian, Arabic, Korean, Thai)
RTX 5090, PP-OCRv5 mobile latin, TensorRT FP16, pool=5. Prints, handwriting, layout detection. This is the fast lane.
- 🔍 Structured extraction
- 📝 Markdown output
- 📊 Table parsing
Requirements: Linux, NVIDIA driver 595+, Turing or newer GPU (RTX 20-series / GTX 16-series+).
docker run --gpus all -p 8000:8000 -p 50051:50051 \
-v trt-cache:/home/ocr/.cache/turbo-ocr \
ghcr.io/aiptimizer/turboocr:v2.3.0First startup builds TensorRT engines from ONNX. This takes about 90 seconds on a 5090 GPU and up to an hour on older ones. Set TRT_OPT_LEVEL=3 to cut build time 3 to 5x with a small speed regression. The volume caches the engines, so subsequent starts are instant. During the build, requests will return a connection refused error from nginx until the backend is ready. nginx (port 8000) reverse-proxies to Drogon (port 8080), and both start automatically.
curl -X POST http://localhost:8000/ocr/raw \
--data-binary @document.png -H "Content-Type: image/png"{
"results": [
{"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}
]
}HTTP on port 8000, gRPC on port 50051 — single binary, shared GPU pipeline pool.
Important: Use persistent connections (HTTP keep-alive). Sending many short-lived connections (e.g. one
curlper request in a loop) can overwhelm the server and cause it to stall. All standard HTTP client libraries (requests.Session,aiohttp, Gohttp.Client, etc.) reuse connections by default.
| Endpoint | Input | Description |
|---|---|---|
/health |
— | Returns "ok" |
/health/live |
— | Kubernetes liveness probe |
/health/ready |
— | Readiness probe — verifies GPU pipeline is responsive |
/ocr/raw |
Raw image bytes | Fastest path — PNG, JPEG, etc. |
/ocr |
{"image": "<base64>"} |
For clients that can only send JSON |
/ocr/batch |
{"images": ["<b64>", ...]} |
Multiple images in one request |
/ocr/pixels |
Raw BGR bytes + X-Width / X-Height / X-Channels headers |
Zero-decode path — see /ocr/pixels |
/ocr/pdf |
Raw bytes, {"pdf": "<b64>"}, or multipart/form-data |
All pages OCR'd in parallel |
/metrics |
— | Prometheus metrics (text exposition format) |
| gRPC | Raw bytes (protobuf) | Port 50051 — see proto/ocr.proto |
| Parameter | Endpoints | Values | Default |
|---|---|---|---|
layout |
all | 0 / 1 |
0 — include layout regions (~20% throughput cost) |
reading_order |
image routes | 0 / 1 |
0 — emit reading_order array indexing results in proper reading order (auto-enables layout=1). Class-aware: header → body → footer/footnote/reference; XY-cut on body with row-tolerant table-cell sort and orphan placement |
as_blocks |
image + PDF routes | 0 / 1 |
0 — when 1, response includes a blocks array: paragraph-level aggregate, one entry per non-empty layout cell, in reading order. Auto-enables layout=1 and reading_order=1. Each block has {id, layout_id, class, bounding_box, content, order_index}. Mirrors PaddleX PP-StructureV3 parsing_res_list granularity. |
mode |
/ocr/pdf |
ocr / geometric / auto / auto_verified |
ocr — on the CPU binary, auto_verified is silently aliased to auto (no native text re-verifier on CPU). Inspect the per-page mode field in the response to see which path actually ran. |
dpi |
/ocr/pdf |
50–600 |
100 — render resolution |
Parameter parsing rules. Parameter names are case-sensitive: ?layout=1 works, ?Layout=1 is silently ignored. Boolean values for layout, reading_order, and as_blocks accept any case of 1/0, true/false, on/off, yes/no, and reject anything else with 400 INVALID_PARAMETER. Values for mode= are case-sensitive and silently fall back to the configured default when unrecognized — ?mode=Auto, ?mode=AUTO, or ?mode=foobar all run as mode=ocr (or whatever ENABLE_PDF_MODE is set to) without error. Always pass exactly ocr, geometric, auto, or auto_verified.
# Image — raw bytes (fastest)
curl -X POST http://localhost:8000/ocr/raw \
--data-binary @doc.png -H "Content-Type: image/png"
# Image — base64 JSON
curl -X POST http://localhost:8000/ocr \
-H "Content-Type: application/json" \
-d '{"image":"'$(base64 -w0 doc.png)'"}'
# PDF — raw bytes
curl -X POST http://localhost:8000/ocr/pdf \
--data-binary @document.pdf
# PDF — multipart (works from any client, including browsers)
curl -X POST http://localhost:8000/ocr/pdf \
-F "file=@document.pdf"
# PDF — with layout + auto mode
curl -X POST "http://localhost:8000/ocr/pdf?layout=1&mode=auto" \
--data-binary @document.pdf
# gRPC (grpcurl uses base64 for CLI; real clients send raw bytes)
grpcurl -plaintext -d '{"image":"'$(base64 -w0 doc.png)'"}' \
localhost:50051 ocr.OCRService/RecognizeFor clients that already hold a decoded image in memory (NumPy, OpenCV, custom pipelines), /ocr/pixels skips the PNG/JPEG decode step entirely. The body is sent as raw pixel bytes; dimensions travel in HTTP headers.
| Header | Required | Values | Meaning |
|---|---|---|---|
X-Width |
yes | 1–MAX_IMAGE_DIM (default 16384) |
Image width in pixels |
X-Height |
yes | 1–MAX_IMAGE_DIM (default 16384) |
Image height in pixels |
X-Channels |
no | 1 or 3 (default 3) |
3 = BGR (OpenCV order, not RGB), 1 = grayscale |
- Body: raw pixel bytes, length must equal
width * height * channelsexactly. A mismatch returns400 BODY_SIZE_MISMATCH. - Query parameters: the same
?layout=and?reading_order=as/ocrapply. - Errors:
MISSING_HEADER(noX-Width/X-Height),INVALID_HEADER(unparseable values),INVALID_DIMENSIONS(non-positive size or channels other than 1/3),DIMENSIONS_TOO_LARGE(exceedsMAX_IMAGE_DIM). - Use case: the hot path when upstream code already has a decoded
cv::Mat/np.ndarrayand you don't want to round-trip through PNG.
# Python — send a decoded OpenCV image (BGR)
python -c "
import cv2, requests
img = cv2.imread('doc.png') # BGR, HxWx3
h, w, c = img.shape
requests.post('http://localhost:8000/ocr/pixels',
data=img.tobytes(),
headers={'X-Width': str(w), 'X-Height': str(h), 'X-Channels': str(c)})
"Image endpoints return:
{"results": [{"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}]}With ?layout=1, a layout array is added. Each OCR result gets a layout_id linking it to the containing layout region:
{
"results": [{"text": "...", "confidence": 0.97, "id": 0, "layout_id": 2, "bounding_box": [...]}],
"layout": [{"id": 0, "class": "header", "confidence": 0.91, "bounding_box": [...]},
{"id": 2, "class": "table", "confidence": 0.95, "bounding_box": [...]}]
}PDF endpoint wraps results per page:
{
"pages": [{
"page": 1, "page_index": 0, "dpi": 100, "width": 1047, "height": 1389,
"mode": "ocr", "text_layer_quality": "absent", "results": [...]
}]
}Coordinate conversion: x_pdf = x_px * 72 / dpi.
Per-page fields:
mode— the resolved mode that actually ran on this page (ocr/geometric/auto_verified). For?mode=autorequests, each page resolves to eithergeometric(text layer accepted) orocr(fell back to OCR), neverauto. On the CPU binary,?mode=auto_verifiedresolves toautosemantics, so per-pagemodewill begeometricorocr—auto_verifiedonly appears on the GPU binary.text_layer_quality— assessment of the page's native text layer:absent— no usable text layer (image-only PDF, fewer than 10 chars, or empty lines)rejected— text layer present but failed sanity checks (non-zero rotation, >5% replacement chars, >10% non-printable chars)trusted— native text passed sanity checks and was used (geometric/auto) or considered for cross-check (auto_verified)- For
mode=ocrthis is alwaysabsent(the text-layer pre-pass is skipped entirely).
| Mode | What it does | Speed |
|---|---|---|
ocr |
Render + full OCR pipeline | Baseline |
geometric |
PDFium text layer only, no rasterization | ~10x faster |
auto |
Per-page: text layer if available, else OCR | Fastest for mixed PDFs |
auto_verified |
Full pipeline + replace with native text where sanity check passes | Slightly slower than OCR |
Caution
PDF text-layer trust model. Modes other than ocr read the PDF's native text layer, which the PDF author controls. A malicious PDF can embed invisible text, remap glyphs via ToUnicode, or inject arbitrary strings that differ from what's visually rendered.
When to use each mode:
| Scenario | Recommended mode | Why |
|---|---|---|
| Untrusted uploads (user-submitted PDFs) | ocr |
Only trusts pixel data — immune to text-layer manipulation |
| Internal/trusted documents | auto or geometric |
Safe when you control the PDF source; much faster |
| High-accuracy with verification | auto_verified |
OCR runs first, then results are cross-checked against the text layer. Accepts native text only if it passes heuristic validation (character count, non-printable ratio < 10%, replacement char ratio < 5%, no rotation) |
Default: mode=ocr (safest). Override per-request via ?mode= query parameter or globally via ENABLE_PDF_MODE env var.
Deployment recommendation: If your service accepts PDFs from untrusted sources, do not set ENABLE_PDF_MODE to geometric or auto globally. Keep the default ocr and only use text-layer modes for trusted internal workflows.
All endpoints accept ?layout=1 to detect document regions using PP-DocLayoutV3 (25 classes):
abstract · algorithm · aside_text · chart · content · display_formula · doc_title · figure_title · footer · footer_image · footnote · formula_number · header · header_image · image · inline_formula · number · paragraph_title · reference · reference_content · seal · table · text · vertical_text · vision_footnote
When ?reading_order=1 is set, classes are partitioned into three strata before XY-cut runs, so common page furniture lands in the right slot regardless of where the layout model placed it spatially: TOP is read first, then BODY (sorted by XY-cut), then BOTTOM.
| Class ID | Name | Bucket |
|---|---|---|
| 0 | abstract |
BODY |
| 1 | algorithm |
BODY |
| 2 | aside_text |
BODY |
| 3 | chart |
BODY |
| 4 | content |
BODY |
| 5 | display_formula |
BODY |
| 6 | doc_title |
BODY |
| 7 | figure_title |
BODY |
| 8 | footer |
BOTTOM |
| 9 | footer_image |
BOTTOM |
| 10 | footnote |
BOTTOM |
| 11 | formula_number |
BODY |
| 12 | header |
TOP |
| 13 | header_image |
TOP |
| 14 | image |
BODY |
| 15 | inline_formula |
BODY |
| 16 | number |
BODY |
| 17 | paragraph_title |
BODY |
| 18 | reference |
BOTTOM |
| 19 | reference_content |
BOTTOM |
| 20 | seal |
BODY |
| 21 | table |
BODY |
| 22 | text |
BODY |
| 23 | vertical_text |
BODY |
| 24 | vision_footnote |
BOTTOM |
Class 16 (number, page numbers) deliberately stays in BODY because page numbers can appear at the top or the bottom of a page — XY-cut places them by geometry. Class IDs are pinned with static_assert against the PaddleX label list, so a future re-shuffle would fail the build rather than silently misroute classes.
Layout detection overlay — color-coded regions: paragraph_title, text, chart, figure_title, header, footer, number
FUNSD form-understanding dataset (50 pages, ~170 words/page). Same word-level F1 metric for all engines. Single RTX 5090.
Benchmark caveats
- Crude accuracy metric. Bag-of-words F1 ignores order and duplicate counts. CER or reading-order metrics would likely help VLM systems.
- VLMs could run faster. Served via off-the-shelf vLLM in fp16. Quantization, speculative decoding, or a dedicated stack would push throughput higher.
- VLM prompts are untuned. With prompt engineering both VLMs would likely surpass every CTC engine here.
- Single domain. FUNSD is English business forms; other document types would look different.
Reproduce: python tests/benchmark/comparison/bench_turbo_ocr.py (requires running server + datasets library).
| Variable | Default | Description |
|---|---|---|
OCR_LANG |
(unset = latin) | Language bundle: latin, chinese, greek, eslav, arabic, korean, thai. All bundles are baked into the image at build time — no runtime download. |
OCR_SERVER |
(unset) | With OCR_LANG=chinese, set to 1 to use the 84 MB PP-OCRv5 server rec instead of the 16 MB mobile rec. Ignored for other languages. |
PIPELINE_POOL_SIZE |
auto | Concurrent GPU pipelines (~1.4 GB each) |
DISABLE_LAYOUT |
0 |
Set to 1 to disable PP-DocLayoutV3 layout detection and save ~300-500 MB VRAM |
ENABLE_PDF_MODE |
ocr |
Default PDF mode: ocr / geometric / auto / auto_verified |
DISABLE_ANGLE_CLS |
0 |
Skip angle classifier (~0.4 ms savings) |
DET_MAX_SIDE |
960 |
Max detection input side (px). Bounds: 32–4096. The TRT engine profile is built to match this value; changing it invalidates the cached engine and triggers a one-time rebuild. |
TRT_OPT_LEVEL |
5 |
TensorRT builder optimization level. Bounds: 0–5. Lower values trade runtime perf for faster cold builds (3 typically builds ~3-5× faster with <5% runtime regression). The cache key includes the level, so different values produce separate engines. |
TRT_ENGINE_CACHE |
~/.cache/turbo-ocr |
Directory for cached TensorRT engines. Set to a host-mounted path to share engines across container restarts. |
TURBO_OCR_HOST |
0.0.0.0 |
Bind address for HTTP and gRPC listeners. Default binds every IPv4 interface; use 127.0.0.1 for loopback only, :: for all interfaces incl. IPv6, or a specific interface IP. Equivalent CLI flag: --host. |
PORT / GRPC_PORT |
8080 / 50051 |
Server ports. The binary listens on PORT=8080 by default; the Docker image runs nginx in front of it on port 8000, so external clients use 8000 and PORT only matters for direct/native runs. |
PDF_DAEMONS / PDF_WORKERS |
16 / 4 |
PDF render parallelism |
GRPC_BATCH_WORKERS |
8 |
Parallel workers in gRPC RecognizeBatch for fan-out across pipeline pool |
HTTP_THREADS |
pool * 32 |
Work pool threads for blocking inference |
MAX_PDF_PAGES |
2000 |
Maximum pages per PDF request |
SHUTDOWN_GRACE_SECONDS |
30 |
Seconds to wait for inflight requests to drain on SIGTERM/SIGINT before tearing down. Set to stay below your orchestrator's SIGKILL grace (K8s default 30s). |
GRPC_CQS |
10 |
Number of gRPC completion queues. Higher values trade memory for connection-handling parallelism on high-fanout deployments. |
GRPC_RESPONSE_MODE |
json_bytes |
gRPC response format: json_bytes (default — full JSON in json_response field) or structured (typed protobuf fields). |
MAX_BODY_MB |
100 |
Max request body size in MB. Applied at all three layers: nginx (413 at proxy), Drogon HTTP (setClientMaxBodySize), and gRPC (SetMaxReceive/SendMessageSize). Bounds: 1–102400. |
MAX_BODY_MEMORY_MB |
min(1024, MAX_BODY_MB) — effectively 100 with stock config |
Per-request in-memory buffer threshold. Bodies up to this size stay in RAM; larger ones spill to a tempfile under /tmp. Always clamped to [1, MAX_BODY_MB], so the effective default tracks MAX_BODY_MB. Raise MAX_BODY_MB first to unlock larger in-memory buffers. Lower on memory-constrained hosts (e.g. MAX_BODY_MEMORY_MB=50 caps buffer RSS at ~50 MB × concurrent requests). |
MAX_IMAGE_DIM |
16384 |
Max width or height (px) accepted on /ocr/pixels and image-decode routes. Bounds: 64–65535. |
LOG_LEVEL |
info |
Log level: debug / info / warn / error |
LOG_FORMAT |
json |
Log format: json (structured) / text (human-readable) |
TOCR_LOG_RATELIMIT |
10 |
Max rate-limited logs per call site per 1s window (applies to per-request error paths). 0 disables. Format N or N:W_MS (e.g. 5:2000 = 5 logs / 2s). On window roll a single [suppressed logs] rollup line is emitted. |
Every knob above is also exposed as a CLI flag (--http-port, --max-body-mb, --disable-layout, --det-max-side, --log-level, etc.). The two exceptions, which remain env-only because their valid set is context-dependent, are OCR_LANG (validated against installed model bundles at first request) and TOCR_LOG_RATELIMIT (custom N or N:W_MS format). CLI flags override env vars when both are set. Useful flags for inspection:
paddle_highspeed_cpp --help # full flag listing
paddle_highspeed_cpp --print-config # resolved JSON config; exit 0
paddle_highspeed_cpp --check-config # validate only; exit 0 on valid, 2 on errors
Malformed env vars or out-of-range values cause startup to fail with a clear error list — the server refuses to bind rather than silently coerce bad input (e.g. PORT=abc used to become 1; it now exits with [config error] PORT="abc" is not a valid integer). Validate config without booting the pipeline using --check-config.
Layout detection is enabled by default. The model is loaded at startup but only runs when a request includes ?layout=1. Requests without ?layout=1 have zero overhead. Requests with ?layout=1 reduce throughput by ~20%. Set DISABLE_LAYOUT=1 to skip loading the model entirely and save ~300-500 MB VRAM.
Migration note (v2.3+): The legacy
ENABLE_LAYOUTenv var has been removed. If set, startup fails with a clear error — useDISABLE_LAYOUT=1to disable layout, or remove the var (layout is on by default).
docker run --gpus all -p 8000:8000 \
-v trt-cache:/home/ocr/.cache/turbo-ocr \
-e PIPELINE_POOL_SIZE=3 \
ghcr.io/aiptimizer/turboocr:v2.3.0Add MAX_PDF_PAGES (default 2000) to limit the number of pages processed per PDF request. LOG_LEVEL (debug/info/warn/error) and LOG_FORMAT (json/text) control structured logging output.
Scrape GET /metrics for Prometheus-compatible metrics:
turbo_ocr_requests_total{route="/ocr/raw",status="2xx"} 1042
turbo_ocr_request_duration_seconds_bucket{route="/ocr/raw",le="0.025"} 980
turbo_ocr_request_duration_seconds_sum{route="/ocr/raw"} 12.345
turbo_ocr_request_duration_seconds_count{route="/ocr/raw"} 1042
turbo_ocr_gpu_vram_used_bytes 9052815360
turbo_ocr_gpu_vram_total_bytes 33661911040
turbo_ocr_pipeline_pool_size 5
turbo_ocr_pool_exhaustions_total 0
turbo_ocr_request_bytes_total 49493243
turbo_ocr_request_body_avg_bytes 9407
Every response includes:
| Header | Description |
|---|---|
X-Request-Id |
UUID v7 (or propagated from client X-Request-Id header) |
X-Inference-Time-Ms |
End-to-end processing time in milliseconds |
Retry-After |
Seconds to wait (only on 503 responses) |
| Endpoint | Description |
|---|---|
GET /health |
Basic liveness check |
GET /health/live |
Kubernetes liveness probe |
GET /health/ready |
Readiness probe — verifies GPU pipeline is responsive |
All error responses return JSON with Content-Type: application/json:
{"error": {"code": "EMPTY_BODY", "message": "Empty body"}}Error codes: EMPTY_BODY, INVALID_JSON, MISSING_IMAGE, BASE64_DECODE_FAILED, IMAGE_DECODE_FAILED, INVALID_PARAMETER, UNSUPPORTED_PARAMETER, INVALID_DPI, INVALID_DIMENSIONS, DIMENSIONS_TOO_LARGE, BODY_SIZE_MISMATCH, MISSING_HEADER, INVALID_HEADER, EMPTY_BATCH, MISSING_FILE, MISSING_PDF, INVALID_MULTIPART, PDF_RENDER_FAILED, PDF_TOO_LARGE, EMPTY_PDF, SERVER_BUSY, NOT_READY, INFERENCE_ERROR.
| Dependency | GPU | CPU |
|---|---|---|
| GCC 13.3+ / C++20 | x | x |
| CUDA + TensorRT 10.2+ | x | |
| OpenCV 4.x | x | x |
| Drogon 1.9+ | x | x |
| gRPC + Protobuf | x | |
| ONNX Runtime 1.22+ | x |
Wuffs, Clipper, PDFium vendored in third_party/.
# Docker (recommended)
docker build -f docker/Dockerfile.gpu -t turboocr .
docker run --gpus all -p 8000:8000 -p 50051:50051 \
-v trt-cache:/home/ocr/.cache/turbo-ocr turboocr
# CPU only (Docker) — ~2-3 img/s, mainly for testing
docker build -f docker/Dockerfile.cpu -t turboocr-cpu .
docker run -p 8000:8000 turboocr-cpu
# Native build — PP-OCRv5 models auto-fetched into ./models/ on first build
cmake -B build -DTENSORRT_DIR=/usr/local/tensorrt
cmake --build build -j$(nproc)
LD_LIBRARY_PATH=/usr/local/tensorrt/lib ./build/paddle_highspeed_cpp
# CPU-only native
cmake -B build_cpu -DUSE_CPU_ONLY=ON
cmake --build build_cpu -j$(nproc)
./build_cpu/paddle_cpu_server
# If your distro's gRPC CMake config conflicts with system protobuf,
# add -DCMAKE_DISABLE_FIND_PACKAGE_gRPC=ON to fall back to pkg-config.
# To skip the model auto-fetch (e.g. in CI), add -DFETCH_MODELS=OFF.
# CUDA SM target. Native builds default to sm_120 (Blackwell, RTX 50-series)
# only — the full multi-arch fat binary is ~12.5 GB and adds 10-15 s of
# PTX-JIT to first-start on cold cache. To target other GPUs, opt back in:
# cmake -B build -DCMAKE_CUDA_ARCHITECTURES="86;89;120" ...
# Reference: 75=Turing, 80=A100, 86=Ampere consumer, 89=Ada, 90=Hopper,
# 100=Blackwell DC, 120=Blackwell consumer.Set via the OCR_LANG environment variable. Every supported language bundle is baked into the image at build time from the pinned PP-OCRv5 GitHub Release (SHA256-verified). No runtime downloads, no network dependency at container start.
OCR_LANG |
Script / family | Notes |
|---|---|---|
(unset) / latin |
Latin + basic Greek (English, German, French, Italian, Polish, Czech, …) | 836-char dict; what powers the benchmarks above |
chinese |
Simplified + Traditional Chinese | 18,385-class mobile rec (16 MB); set OCR_SERVER=1 for the 84 MB server variant |
greek |
dedicated Greek rec | 356-class Greek-specialized rec (7.8 MB) — higher accuracy than Latin's combined dict |
korean |
Hangul + basic Latin | 11,947-class rec (13 MB) |
arabic, eslav, thai |
per-script PP-OCRv5 | 7-8 MB each |
# Chinese
docker run --gpus all -p 8000:8000 -p 50051:50051 \
-v trt-cache:/home/ocr/.cache/turbo-ocr \
-e OCR_LANG=chinese \
ghcr.io/aiptimizer/turboocr:v2.3.0Volume tip: use a named volume (
trt-cache:) as shown above, not a host bind-mount. Named volumes auto-populate from the image on first use, so the baked language bundles survive. A bind-mount of an empty host directory would shadow/home/ocr/.cache/turbo-ocrand leave the server with nothing to load.
Run tests/language_smoketest.py to verify any language end-to-end on your
hardware (renders a short phrase, OCRs it, checks char-recall against a
per-language threshold).
This project builds on the work of several open-source projects:
- PaddleOCR (Baidu) — PP-OCRv5 detection, recognition, and classification models. PP-DocLayoutV3 layout detection model. This project would not exist without their research and pre-trained weights.
- Drogon — high-performance async C++ HTTP framework
- Wuffs — fast PNG decoder by Google (vendored)
- PDFium — PDF rendering and text extraction (vendored)
- Clipper — polygon clipping for text detection post-processing (vendored)
MIT. See LICENSE.
Main Sponsor: Miruiq — AI-powered data extraction from PDFs and documents.




