Skip to content

aiptimizer/TurboOCR

Repository files navigation

Turbo OCR — Fast GPU OCR server. 270 img/s on FUNSD.

GPU-accelerated OCR server. 50x faster than PaddleOCR Python.
C++ / CUDA / TensorRT / PP-OCRv5 — Linux + NVIDIA GPU

270 img/s turboocr.com Release Docker C++20 CUDA TensorRT 10.16 Drogon nginx gRPC PaddleOCR Prometheus MIT License

Quick Start · API · Benchmarks · Monitoring · Configuration · Build · Roadmap · Website


Turbo-OCR vs alternatives on FUNSD

Highlights

  • 🚀 270 img/s on FUNSD A4 forms (c=16) — 1,200+ img/s on sparse images
  • 11 ms p50 latency, single request
  • 🎯 F1 = 90.2% on FUNSD — higher accuracy than PaddleOCR Python with the same weights
  • 🖨️ Prints & handwriting — PP-OCRv5 handles both out of the box
  • 📄 PDF native — pages rendered and OCR'd in parallel
  • 🔒 4 PDF modes — pure OCR, native text layer, auto-dispatch, detection-verified hybrid
  • 🧩 Layout detection — PP-DocLayoutV3 with 25 region classes, per-request ?layout=1 toggle
  • 📖 Reading order — class-aware XY-cut (header → body → footer/reference), row-tolerant table-cell sort, orphan-aware placement, opt-in via ?reading_order=1
  • 🌐 HTTP + gRPC from a single binary, sharing the same GPU pipeline pool
  • 🐳 One-line Docker deploydocker run with auto TRT engine build on first start
  • 📊 Prometheus metrics — request counters, latency histograms, VRAM usage on /metrics
  • 🌐 Configurable languages (Latine e.g., English, French, German, Spanish, Portuguese; Chinese, Greek, Russian, Arabic, Korean, Thai)

RTX 5090, PP-OCRv5 mobile latin, TensorRT FP16, pool=5. Prints, handwriting, layout detection. This is the fast lane.

🗺️ Roadmap

  • 🔍 Structured extraction
  • 📝 Markdown output
  • 📊 Table parsing

Quick Start

Requirements: Linux, NVIDIA driver 595+, Turing or newer GPU (RTX 20-series / GTX 16-series+).

docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  ghcr.io/aiptimizer/turboocr:v2.3.0

First startup builds TensorRT engines from ONNX. This takes about 90 seconds on a 5090 GPU and up to an hour on older ones. Set TRT_OPT_LEVEL=3 to cut build time 3 to 5x with a small speed regression. The volume caches the engines, so subsequent starts are instant. During the build, requests will return a connection refused error from nginx until the backend is ready. nginx (port 8000) reverse-proxies to Drogon (port 8080), and both start automatically.

curl -X POST http://localhost:8000/ocr/raw \
  --data-binary @document.png -H "Content-Type: image/png"
{
  "results": [
    {"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}
  ]
}

API

HTTP on port 8000, gRPC on port 50051 — single binary, shared GPU pipeline pool.

Important: Use persistent connections (HTTP keep-alive). Sending many short-lived connections (e.g. one curl per request in a loop) can overwhelm the server and cause it to stall. All standard HTTP client libraries (requests.Session, aiohttp, Go http.Client, etc.) reuse connections by default.

Endpoints

Endpoint Input Description
/health Returns "ok"
/health/live Kubernetes liveness probe
/health/ready Readiness probe — verifies GPU pipeline is responsive
/ocr/raw Raw image bytes Fastest path — PNG, JPEG, etc.
/ocr {"image": "<base64>"} For clients that can only send JSON
/ocr/batch {"images": ["<b64>", ...]} Multiple images in one request
/ocr/pixels Raw BGR bytes + X-Width / X-Height / X-Channels headers Zero-decode path — see /ocr/pixels
/ocr/pdf Raw bytes, {"pdf": "<b64>"}, or multipart/form-data All pages OCR'd in parallel
/metrics Prometheus metrics (text exposition format)
gRPC Raw bytes (protobuf) Port 50051 — see proto/ocr.proto

Query Parameters

Parameter Endpoints Values Default
layout all 0 / 1 0 — include layout regions (~20% throughput cost)
reading_order image routes 0 / 1 0 — emit reading_order array indexing results in proper reading order (auto-enables layout=1). Class-aware: header → body → footer/footnote/reference; XY-cut on body with row-tolerant table-cell sort and orphan placement
as_blocks image + PDF routes 0 / 1 0 — when 1, response includes a blocks array: paragraph-level aggregate, one entry per non-empty layout cell, in reading order. Auto-enables layout=1 and reading_order=1. Each block has {id, layout_id, class, bounding_box, content, order_index}. Mirrors PaddleX PP-StructureV3 parsing_res_list granularity.
mode /ocr/pdf ocr / geometric / auto / auto_verified ocr — on the CPU binary, auto_verified is silently aliased to auto (no native text re-verifier on CPU). Inspect the per-page mode field in the response to see which path actually ran.
dpi /ocr/pdf 50600 100 — render resolution

Parameter parsing rules. Parameter names are case-sensitive: ?layout=1 works, ?Layout=1 is silently ignored. Boolean values for layout, reading_order, and as_blocks accept any case of 1/0, true/false, on/off, yes/no, and reject anything else with 400 INVALID_PARAMETER. Values for mode= are case-sensitive and silently fall back to the configured default when unrecognized — ?mode=Auto, ?mode=AUTO, or ?mode=foobar all run as mode=ocr (or whatever ENABLE_PDF_MODE is set to) without error. Always pass exactly ocr, geometric, auto, or auto_verified.

Examples

# Image — raw bytes (fastest)
curl -X POST http://localhost:8000/ocr/raw \
  --data-binary @doc.png -H "Content-Type: image/png"

# Image — base64 JSON
curl -X POST http://localhost:8000/ocr \
  -H "Content-Type: application/json" \
  -d '{"image":"'$(base64 -w0 doc.png)'"}'

# PDF — raw bytes
curl -X POST http://localhost:8000/ocr/pdf \
  --data-binary @document.pdf

# PDF — multipart (works from any client, including browsers)
curl -X POST http://localhost:8000/ocr/pdf \
  -F "file=@document.pdf"

# PDF — with layout + auto mode
curl -X POST "http://localhost:8000/ocr/pdf?layout=1&mode=auto" \
  --data-binary @document.pdf

# gRPC (grpcurl uses base64 for CLI; real clients send raw bytes)
grpcurl -plaintext -d '{"image":"'$(base64 -w0 doc.png)'"}' \
  localhost:50051 ocr.OCRService/Recognize

/ocr/pixels (zero-decode path)

For clients that already hold a decoded image in memory (NumPy, OpenCV, custom pipelines), /ocr/pixels skips the PNG/JPEG decode step entirely. The body is sent as raw pixel bytes; dimensions travel in HTTP headers.

Header Required Values Meaning
X-Width yes 1MAX_IMAGE_DIM (default 16384) Image width in pixels
X-Height yes 1MAX_IMAGE_DIM (default 16384) Image height in pixels
X-Channels no 1 or 3 (default 3) 3 = BGR (OpenCV order, not RGB), 1 = grayscale
  • Body: raw pixel bytes, length must equal width * height * channels exactly. A mismatch returns 400 BODY_SIZE_MISMATCH.
  • Query parameters: the same ?layout= and ?reading_order= as /ocr apply.
  • Errors: MISSING_HEADER (no X-Width / X-Height), INVALID_HEADER (unparseable values), INVALID_DIMENSIONS (non-positive size or channels other than 1/3), DIMENSIONS_TOO_LARGE (exceeds MAX_IMAGE_DIM).
  • Use case: the hot path when upstream code already has a decoded cv::Mat / np.ndarray and you don't want to round-trip through PNG.
# Python — send a decoded OpenCV image (BGR)
python -c "
import cv2, requests
img = cv2.imread('doc.png')        # BGR, HxWx3
h, w, c = img.shape
requests.post('http://localhost:8000/ocr/pixels',
              data=img.tobytes(),
              headers={'X-Width': str(w), 'X-Height': str(h), 'X-Channels': str(c)})
"

Response Format

Image endpoints return:

{"results": [{"text": "Invoice Total", "confidence": 0.97, "bounding_box": [[42,10],[210,10],[210,38],[42,38]]}]}

With ?layout=1, a layout array is added. Each OCR result gets a layout_id linking it to the containing layout region:

{
  "results": [{"text": "...", "confidence": 0.97, "id": 0, "layout_id": 2, "bounding_box": [...]}],
  "layout": [{"id": 0, "class": "header", "confidence": 0.91, "bounding_box": [...]},
             {"id": 2, "class": "table", "confidence": 0.95, "bounding_box": [...]}]
}

PDF endpoint wraps results per page:

{
  "pages": [{
    "page": 1, "page_index": 0, "dpi": 100, "width": 1047, "height": 1389,
    "mode": "ocr", "text_layer_quality": "absent", "results": [...]
  }]
}

Coordinate conversion: x_pdf = x_px * 72 / dpi.

Per-page fields:

  • mode — the resolved mode that actually ran on this page (ocr / geometric / auto_verified). For ?mode=auto requests, each page resolves to either geometric (text layer accepted) or ocr (fell back to OCR), never auto. On the CPU binary, ?mode=auto_verified resolves to auto semantics, so per-page mode will be geometric or ocrauto_verified only appears on the GPU binary.
  • text_layer_quality — assessment of the page's native text layer:
    • absent — no usable text layer (image-only PDF, fewer than 10 chars, or empty lines)
    • rejected — text layer present but failed sanity checks (non-zero rotation, >5% replacement chars, >10% non-printable chars)
    • trusted — native text passed sanity checks and was used (geometric / auto) or considered for cross-check (auto_verified)
    • For mode=ocr this is always absent (the text-layer pre-pass is skipped entirely).

PDF Extraction Modes

Mode What it does Speed
ocr Render + full OCR pipeline Baseline
geometric PDFium text layer only, no rasterization ~10x faster
auto Per-page: text layer if available, else OCR Fastest for mixed PDFs
auto_verified Full pipeline + replace with native text where sanity check passes Slightly slower than OCR

Caution

PDF text-layer trust model. Modes other than ocr read the PDF's native text layer, which the PDF author controls. A malicious PDF can embed invisible text, remap glyphs via ToUnicode, or inject arbitrary strings that differ from what's visually rendered.

When to use each mode:

Scenario Recommended mode Why
Untrusted uploads (user-submitted PDFs) ocr Only trusts pixel data — immune to text-layer manipulation
Internal/trusted documents auto or geometric Safe when you control the PDF source; much faster
High-accuracy with verification auto_verified OCR runs first, then results are cross-checked against the text layer. Accepts native text only if it passes heuristic validation (character count, non-printable ratio < 10%, replacement char ratio < 5%, no rotation)

Default: mode=ocr (safest). Override per-request via ?mode= query parameter or globally via ENABLE_PDF_MODE env var.

Deployment recommendation: If your service accepts PDFs from untrusted sources, do not set ENABLE_PDF_MODE to geometric or auto globally. Keep the default ocr and only use text-layer modes for trusted internal workflows.

Layout Detection

All endpoints accept ?layout=1 to detect document regions using PP-DocLayoutV3 (25 classes):

abstract · algorithm · aside_text · chart · content · display_formula · doc_title · figure_title · footer · footer_image · footnote · formula_number · header · header_image · image · inline_formula · number · paragraph_title · reference · reference_content · seal · table · text · vertical_text · vision_footnote

Layout classes (reading-order buckets)

When ?reading_order=1 is set, classes are partitioned into three strata before XY-cut runs, so common page furniture lands in the right slot regardless of where the layout model placed it spatially: TOP is read first, then BODY (sorted by XY-cut), then BOTTOM.

Class ID Name Bucket
0 abstract BODY
1 algorithm BODY
2 aside_text BODY
3 chart BODY
4 content BODY
5 display_formula BODY
6 doc_title BODY
7 figure_title BODY
8 footer BOTTOM
9 footer_image BOTTOM
10 footnote BOTTOM
11 formula_number BODY
12 header TOP
13 header_image TOP
14 image BODY
15 inline_formula BODY
16 number BODY
17 paragraph_title BODY
18 reference BOTTOM
19 reference_content BOTTOM
20 seal BODY
21 table BODY
22 text BODY
23 vertical_text BODY
24 vision_footnote BOTTOM

Class 16 (number, page numbers) deliberately stays in BODY because page numbers can appear at the top or the bottom of a page — XY-cut places them by geometry. Class IDs are pinned with static_assert against the PaddleX label list, so a future re-shuffle would fail the build rather than silently misroute classes.

Layout detection overlay
Layout detection overlay — color-coded regions: paragraph_title, text, chart, figure_title, header, footer, number


Benchmarks

FUNSD form-understanding dataset (50 pages, ~170 words/page). Same word-level F1 metric for all engines. Single RTX 5090.

Accuracy

Throughput

Latency

Benchmark caveats
  • Crude accuracy metric. Bag-of-words F1 ignores order and duplicate counts. CER or reading-order metrics would likely help VLM systems.
  • VLMs could run faster. Served via off-the-shelf vLLM in fp16. Quantization, speculative decoding, or a dedicated stack would push throughput higher.
  • VLM prompts are untuned. With prompt engineering both VLMs would likely surpass every CTC engine here.
  • Single domain. FUNSD is English business forms; other document types would look different.

Reproduce: python tests/benchmark/comparison/bench_turbo_ocr.py (requires running server + datasets library).


Configuration

Variable Default Description
OCR_LANG (unset = latin) Language bundle: latin, chinese, greek, eslav, arabic, korean, thai. All bundles are baked into the image at build time — no runtime download.
OCR_SERVER (unset) With OCR_LANG=chinese, set to 1 to use the 84 MB PP-OCRv5 server rec instead of the 16 MB mobile rec. Ignored for other languages.
PIPELINE_POOL_SIZE auto Concurrent GPU pipelines (~1.4 GB each)
DISABLE_LAYOUT 0 Set to 1 to disable PP-DocLayoutV3 layout detection and save ~300-500 MB VRAM
ENABLE_PDF_MODE ocr Default PDF mode: ocr / geometric / auto / auto_verified
DISABLE_ANGLE_CLS 0 Skip angle classifier (~0.4 ms savings)
DET_MAX_SIDE 960 Max detection input side (px). Bounds: 32–4096. The TRT engine profile is built to match this value; changing it invalidates the cached engine and triggers a one-time rebuild.
TRT_OPT_LEVEL 5 TensorRT builder optimization level. Bounds: 0–5. Lower values trade runtime perf for faster cold builds (3 typically builds ~3-5× faster with <5% runtime regression). The cache key includes the level, so different values produce separate engines.
TRT_ENGINE_CACHE ~/.cache/turbo-ocr Directory for cached TensorRT engines. Set to a host-mounted path to share engines across container restarts.
TURBO_OCR_HOST 0.0.0.0 Bind address for HTTP and gRPC listeners. Default binds every IPv4 interface; use 127.0.0.1 for loopback only, :: for all interfaces incl. IPv6, or a specific interface IP. Equivalent CLI flag: --host.
PORT / GRPC_PORT 8080 / 50051 Server ports. The binary listens on PORT=8080 by default; the Docker image runs nginx in front of it on port 8000, so external clients use 8000 and PORT only matters for direct/native runs.
PDF_DAEMONS / PDF_WORKERS 16 / 4 PDF render parallelism
GRPC_BATCH_WORKERS 8 Parallel workers in gRPC RecognizeBatch for fan-out across pipeline pool
HTTP_THREADS pool * 32 Work pool threads for blocking inference
MAX_PDF_PAGES 2000 Maximum pages per PDF request
SHUTDOWN_GRACE_SECONDS 30 Seconds to wait for inflight requests to drain on SIGTERM/SIGINT before tearing down. Set to stay below your orchestrator's SIGKILL grace (K8s default 30s).
GRPC_CQS 10 Number of gRPC completion queues. Higher values trade memory for connection-handling parallelism on high-fanout deployments.
GRPC_RESPONSE_MODE json_bytes gRPC response format: json_bytes (default — full JSON in json_response field) or structured (typed protobuf fields).
MAX_BODY_MB 100 Max request body size in MB. Applied at all three layers: nginx (413 at proxy), Drogon HTTP (setClientMaxBodySize), and gRPC (SetMaxReceive/SendMessageSize). Bounds: 1–102400.
MAX_BODY_MEMORY_MB min(1024, MAX_BODY_MB) — effectively 100 with stock config Per-request in-memory buffer threshold. Bodies up to this size stay in RAM; larger ones spill to a tempfile under /tmp. Always clamped to [1, MAX_BODY_MB], so the effective default tracks MAX_BODY_MB. Raise MAX_BODY_MB first to unlock larger in-memory buffers. Lower on memory-constrained hosts (e.g. MAX_BODY_MEMORY_MB=50 caps buffer RSS at ~50 MB × concurrent requests).
MAX_IMAGE_DIM 16384 Max width or height (px) accepted on /ocr/pixels and image-decode routes. Bounds: 64–65535.
LOG_LEVEL info Log level: debug / info / warn / error
LOG_FORMAT json Log format: json (structured) / text (human-readable)
TOCR_LOG_RATELIMIT 10 Max rate-limited logs per call site per 1s window (applies to per-request error paths). 0 disables. Format N or N:W_MS (e.g. 5:2000 = 5 logs / 2s). On window roll a single [suppressed logs] rollup line is emitted.

Every knob above is also exposed as a CLI flag (--http-port, --max-body-mb, --disable-layout, --det-max-side, --log-level, etc.). The two exceptions, which remain env-only because their valid set is context-dependent, are OCR_LANG (validated against installed model bundles at first request) and TOCR_LOG_RATELIMIT (custom N or N:W_MS format). CLI flags override env vars when both are set. Useful flags for inspection:

paddle_highspeed_cpp --help            # full flag listing
paddle_highspeed_cpp --print-config    # resolved JSON config; exit 0
paddle_highspeed_cpp --check-config    # validate only; exit 0 on valid, 2 on errors

Malformed env vars or out-of-range values cause startup to fail with a clear error list — the server refuses to bind rather than silently coerce bad input (e.g. PORT=abc used to become 1; it now exits with [config error] PORT="abc" is not a valid integer). Validate config without booting the pipeline using --check-config.

Layout detection is enabled by default. The model is loaded at startup but only runs when a request includes ?layout=1. Requests without ?layout=1 have zero overhead. Requests with ?layout=1 reduce throughput by ~20%. Set DISABLE_LAYOUT=1 to skip loading the model entirely and save ~300-500 MB VRAM.

Migration note (v2.3+): The legacy ENABLE_LAYOUT env var has been removed. If set, startup fails with a clear error — use DISABLE_LAYOUT=1 to disable layout, or remove the var (layout is on by default).

docker run --gpus all -p 8000:8000 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  -e PIPELINE_POOL_SIZE=3 \
  ghcr.io/aiptimizer/turboocr:v2.3.0

Add MAX_PDF_PAGES (default 2000) to limit the number of pages processed per PDF request. LOG_LEVEL (debug/info/warn/error) and LOG_FORMAT (json/text) control structured logging output.


Monitoring

Prometheus Metrics

Scrape GET /metrics for Prometheus-compatible metrics:

turbo_ocr_requests_total{route="/ocr/raw",status="2xx"} 1042
turbo_ocr_request_duration_seconds_bucket{route="/ocr/raw",le="0.025"} 980
turbo_ocr_request_duration_seconds_sum{route="/ocr/raw"} 12.345
turbo_ocr_request_duration_seconds_count{route="/ocr/raw"} 1042
turbo_ocr_gpu_vram_used_bytes 9052815360
turbo_ocr_gpu_vram_total_bytes 33661911040
turbo_ocr_pipeline_pool_size 5
turbo_ocr_pool_exhaustions_total 0
turbo_ocr_request_bytes_total 49493243
turbo_ocr_request_body_avg_bytes 9407

Response Headers

Every response includes:

Header Description
X-Request-Id UUID v7 (or propagated from client X-Request-Id header)
X-Inference-Time-Ms End-to-end processing time in milliseconds
Retry-After Seconds to wait (only on 503 responses)

Health Endpoints

Endpoint Description
GET /health Basic liveness check
GET /health/live Kubernetes liveness probe
GET /health/ready Readiness probe — verifies GPU pipeline is responsive

Structured Errors

All error responses return JSON with Content-Type: application/json:

{"error": {"code": "EMPTY_BODY", "message": "Empty body"}}

Error codes: EMPTY_BODY, INVALID_JSON, MISSING_IMAGE, BASE64_DECODE_FAILED, IMAGE_DECODE_FAILED, INVALID_PARAMETER, UNSUPPORTED_PARAMETER, INVALID_DPI, INVALID_DIMENSIONS, DIMENSIONS_TOO_LARGE, BODY_SIZE_MISMATCH, MISSING_HEADER, INVALID_HEADER, EMPTY_BATCH, MISSING_FILE, MISSING_PDF, INVALID_MULTIPART, PDF_RENDER_FAILED, PDF_TOO_LARGE, EMPTY_PDF, SERVER_BUSY, NOT_READY, INFERENCE_ERROR.


Building from Source

Dependency GPU CPU
GCC 13.3+ / C++20 x x
CUDA + TensorRT 10.2+ x
OpenCV 4.x x x
Drogon 1.9+ x x
gRPC + Protobuf x
ONNX Runtime 1.22+ x

Wuffs, Clipper, PDFium vendored in third_party/.

# Docker (recommended)
docker build -f docker/Dockerfile.gpu -t turboocr .
docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr turboocr

# CPU only (Docker) — ~2-3 img/s, mainly for testing
docker build -f docker/Dockerfile.cpu -t turboocr-cpu .
docker run -p 8000:8000 turboocr-cpu

# Native build — PP-OCRv5 models auto-fetched into ./models/ on first build
cmake -B build -DTENSORRT_DIR=/usr/local/tensorrt
cmake --build build -j$(nproc)
LD_LIBRARY_PATH=/usr/local/tensorrt/lib ./build/paddle_highspeed_cpp

# CPU-only native
cmake -B build_cpu -DUSE_CPU_ONLY=ON
cmake --build build_cpu -j$(nproc)
./build_cpu/paddle_cpu_server

# If your distro's gRPC CMake config conflicts with system protobuf,
# add -DCMAKE_DISABLE_FIND_PACKAGE_gRPC=ON to fall back to pkg-config.
# To skip the model auto-fetch (e.g. in CI), add -DFETCH_MODELS=OFF.

# CUDA SM target. Native builds default to sm_120 (Blackwell, RTX 50-series)
# only — the full multi-arch fat binary is ~12.5 GB and adds 10-15 s of
# PTX-JIT to first-start on cold cache. To target other GPUs, opt back in:
#   cmake -B build -DCMAKE_CUDA_ARCHITECTURES="86;89;120" ...
# Reference: 75=Turing, 80=A100, 86=Ampere consumer, 89=Ada, 90=Hopper,
# 100=Blackwell DC, 120=Blackwell consumer.

Supported Languages

Set via the OCR_LANG environment variable. Every supported language bundle is baked into the image at build time from the pinned PP-OCRv5 GitHub Release (SHA256-verified). No runtime downloads, no network dependency at container start.

OCR_LANG Script / family Notes
(unset) / latin Latin + basic Greek (English, German, French, Italian, Polish, Czech, …) 836-char dict; what powers the benchmarks above
chinese Simplified + Traditional Chinese 18,385-class mobile rec (16 MB); set OCR_SERVER=1 for the 84 MB server variant
greek dedicated Greek rec 356-class Greek-specialized rec (7.8 MB) — higher accuracy than Latin's combined dict
korean Hangul + basic Latin 11,947-class rec (13 MB)
arabic, eslav, thai per-script PP-OCRv5 7-8 MB each
# Chinese
docker run --gpus all -p 8000:8000 -p 50051:50051 \
  -v trt-cache:/home/ocr/.cache/turbo-ocr \
  -e OCR_LANG=chinese \
  ghcr.io/aiptimizer/turboocr:v2.3.0

Volume tip: use a named volume (trt-cache:) as shown above, not a host bind-mount. Named volumes auto-populate from the image on first use, so the baked language bundles survive. A bind-mount of an empty host directory would shadow /home/ocr/.cache/turbo-ocr and leave the server with nothing to load.

Run tests/language_smoketest.py to verify any language end-to-end on your hardware (renders a short phrase, OCRs it, checks char-recall against a per-language threshold).


Acknowledgements

This project builds on the work of several open-source projects:

  • PaddleOCR (Baidu) — PP-OCRv5 detection, recognition, and classification models. PP-DocLayoutV3 layout detection model. This project would not exist without their research and pre-trained weights.
  • Drogon — high-performance async C++ HTTP framework
  • Wuffs — fast PNG decoder by Google (vendored)
  • PDFium — PDF rendering and text extraction (vendored)
  • Clipper — polygon clipping for text detection post-processing (vendored)

License

MIT. See LICENSE.

Main Sponsor: Miruiq — AI-powered data extraction from PDFs and documents.