Skip to content

Archihub-App/ocrProcessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR processing

About

The Archihub PDF Segmentation and OCR Plugin is designed to efficiently process PDF files, segmenting them into blocks and applying Optical Character Recognition (OCR) to specific regions.

Features

  • PDF Segmentation: The plugin intelligently segments PDF files into distinct blocks for efficient processing using LayoutParser.

  • OCR Integration: OCR is selectively applied to identified blocks, extracting text for further use or analysis using Tesseract.

Installation

  1. Clone this repository to your local machine and place the downloaded folder inside the plugins folder of the application
git clone https://github.com/Archihub-App/ocrProcessing
  1. This plugin supports multilingual OCR functionality. To enable the plugin to work with different languages, you need to download the corresponding tessdata file for the OCR from here and place it inside the tessdata folder inside the plugin directory.

  2. Inside the models folder you should place your config_1.yaml and mymodel_1.pth files

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages