OCR processing

About

The Archihub PDF Segmentation and OCR Plugin is designed to efficiently process PDF files, segmenting them into blocks and applying Optical Character Recognition (OCR) to specific regions.

Features

PDF Segmentation: The plugin intelligently segments PDF files into distinct blocks for efficient processing using LayoutParser.
OCR Integration: OCR is selectively applied to identified blocks, extracting text for further use or analysis using Tesseract.

Installation

Clone this repository to your local machine and place the downloaded folder inside the plugins folder of the application

git clone https://github.com/Archihub-App/ocrProcessing

This plugin supports multilingual OCR functionality. To enable the plugin to work with different languages, you need to download the corresponding tessdata file for the OCR from here and place it inside the tessdata folder inside the plugin directory.
Inside the models folder you should place your config_1.yaml and mymodel_1.pth files

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR processing

About

Features

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Archihub-App/ocrProcessing

Folders and files

Latest commit

History

Repository files navigation

OCR processing

About

Features

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages