Skip to content

Releases: felixdittrich92/OnnxTR

v0.6.2

15 Jan 14:10
a2cc042
Compare
Choose a tag to compare

What's Changed

NOTE: OnnxTR v0.6.2 requires Python >=3.10

Bug Fixes

Publicly available pre-built Docker images: here

OnnxTR demo: Hugging Face Spaces

OnnxTR model collection: OnnxTR Hugging Face collection

Full Changelog: v0.6.1...v0.6.2

v0.6.1

13 Jan 07:39
21a6e14
Compare
Choose a tag to compare

What's Changed

NOTE: OnnxTR v0.6.1 requires Python >=3.10

  • Small fix for custom loaded detection models where assume_straight_pages=False wasn't set correctly.
  • Maintenance updates

Publicly available pre-built Docker images: here

OnnxTR demo: Hugging Face Spaces

OnnxTR model collection: OnnxTR Hugging Face collection

Full Changelog: v0.6.0...v0.6.1

v0.6.0

23 Nov 13:34
d9f8230
Compare
Choose a tag to compare

What's Changed

NOTE: OnnxTR v0.6.0 requires Python >=3.10

New version specifiers

To further enhance OnnxTR as a go-to solution for production environments, two new installation options are introduced, tailored for OpenVINO-powered deployments:

pip install "onnxtr[openvino]"
pip install "onnxtr[openvino-headless]"  # same as "onnxtr[openvino]" but with opencv-headless

OpenVINO™ (Open Visual Inference and Neural Network Optimization) is an open-source toolkit developed by Intel to optimize and deploy AI inference across a variety of hardware. It is specifically designed for Intel architectures but supports multiple hardware targets, including CPUs, GPUs, NPUs, and so on. OpenVINO is particularly well-suited for applications requiring high-performance inference, such as computer vision, natural language processing, and edge AI scenarios.

As you can see this provides a great performance boost:

Screenshot from 2024-11-23 14-30-03

Publicly available pre-built Docker images added

Can be found here

OnnxTR demo

Hugging Face Spaces

OnnxTR Hugging Face collection

Full Changelog: v0.5.1...v0.6.0

v0.5.1

17 Oct 10:27
d17146c
Compare
Choose a tag to compare

What's Changed

  • Improved result.syntesize()
  • Updated Hugging Face demo

Full Changelog: v0.5.0...v0.5.1

v0.5.0

27 Sep 11:16
8285068
Compare
Choose a tag to compare

What's Changed

New version specifiers

To go further forward making OnnxTR the choice for production scenarios 2 new installation options was added:

pip install "onnxtr[cpu-headless]"  # same as "onnxtr[cpu]" but with opencv-headless
pip install "onnxtr[gpu-headless]"  # same as "onnxtr[gpu]" but with opencv-headless

Disable page orientation classification

  • If you deal with documents which contains only small rotations (~ -45 to 45 degrees), you can disable the page orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True and/or detect_orientation=True.
from onnxtr.models import ocr_predictor
model = ocr_predictor(assume_straight_pages=False, disable_page_orientation=True)

Disable crop orientation classification

  • If you deal with documents which contains only horizontal text, you can disable the crop orientation classification to speed up the inference.
  • This will only have an effect with assume_straight_pages=False and/or straighten_pages=True.
from onnxtr.models import ocr_predictor
model = ocr_predictor(assume_straight_pages=False, disable_crop_orientation=True)

Loading custom exported orientation classification models

Syncronized with docTR:

from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from onnxtr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor
custom_page_orientation_model = mobilenet_v3_small_page_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>")
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation("<PATH_TO_CUSTOM_EXPORTED_ONNX_MODEL>"))

predictor = ocr_predictor(assume_straight_pages=False, detect_orientation=True)

# Overwrite the default orientation models
predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

FP16 Support

Full Changelog: v0.4.1...v0.5.0

v0.4.1

21 Aug 06:50
Compare
Choose a tag to compare

What's Changed

  • Fix: straighten_pages=True now also displayed with .show() correctly
  • Added numpy 2.0 support

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0

16 Aug 10:23
Compare
Choose a tag to compare

What's Changed

  • Sync with current docTR state
  • Hf hub integration

HuggingFace Hub integration

Now you can load and/or push models to the hub directly.

Loading

from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, from_hub

img = DocumentFile.from_images(['<image_path>'])
# Load your model from the hub
model = from_hub('onnxtr/my-model')

# Pass it to the predictor
# If your model is a recognition model:
predictor = ocr_predictor(
    det_arch='db_mobilenet_v3_large',
    reco_arch=model
)

# If your model is a detection model:
predictor = ocr_predictor(
    det_arch=model,
    reco_arch='crnn_mobilenet_v3_small'
)

# Get your predictions
res = predictor(img)

Push

from onnxtr.models import parseq, push_to_hf_hub, login_to_hub
from onnxtr.utils.vocabs import VOCABS

# Login to the hub
login_to_hub()

# Recogniton model
model = parseq("~/onnxtr-parseq-multilingual-v1.onnx", vocab=VOCABS["multilingual"])
push_to_hf_hub(
    model,
    model_name="onnxtr-parseq-multilingual-v1",
    task="recognition",  # The task for which the model is intended [detection, recognition, classification]
    arch="parseq",  # The name of the model architecture
    override=False  # Set to `True` if you want to override an existing model / repository
)

# Detection model
model = linknet_resnet18("~/onnxtr-linknet-resnet18.onnx")
push_to_hf_hub(
    model,
    model_name="onnxtr-linknet-resnet18",
    task="detection",
    arch="linknet_resnet18",
    override=True
)

HF Hub search: here.

Collection: here

Full Changelog: v0.3.2...v0.4.0

v0.3.2

09 Jul 09:46
Compare
Choose a tag to compare

What's Changed

  • Fix: Resize transformation / interpolation adjusted to docTR (#10 #22)

Full Changelog: v0.3.1...v0.3.2

v0.3.1

28 Jun 06:16
890ae43
Compare
Choose a tag to compare

What's Changed

  • Minor configuration fix for CUDAExecutionProvider
  • Adjusted default batch sizes
  • avoid init EngineConfig multiple times

Full Changelog: v0.3.0...v0.3.1

v0.3.0

27 Jun 10:13
04f5744
Compare
Choose a tag to compare

What's Changed

  • Sync with current docTR state
  • Added advanced options to configure the underlying execution engine
  • Added new db_mobilenet_v3_large converted models (fp32 & 8bit)

Advanced engine configuration

from onnxruntime import SessionOptions

from onnxtr.models import ocr_predictor, EngineConfig

general_options = SessionOptions()  # For configuartion options see: https://onnxruntime.ai/docs/api/python/api_summary.html#sessionoptions
general_options.enable_cpu_mem_arena = False

# NOTE: The following would force to run only on the GPU if no GPU is available it will raise an error
# List of strings e.g. ["CUDAExecutionProvider", "CPUExecutionProvider"] or a list of tuples with the provider and its options e.g.
# [("CUDAExecutionProvider", {"device_id": 0}), ("CPUExecutionProvider", {"arena_extend_strategy": "kSameAsRequested"})]
providers = [("CUDAExecutionProvider", {"device_id": 0})]  # For available providers see: https://onnxruntime.ai/docs/execution-providers/

engine_config = EngineConfig(
    session_options=general_options,
    providers=providers
)
# We use the default predictor with the custom engine configuration
# NOTE: You can define different engine configurations for detection, recognition and classification depending on your needs
predictor = ocr_predictor(
    det_engine_cfg=engine_config,
    reco_engine_cfg=engine_config,
    clf_engine_cfg=engine_config
)

Full Changelog: v0.2.0...v0.3.0