Skip to content

maylad31/allin1_ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

allin1_ocr

Choose from paddleocr, python-doctr or tesseract to perfrom OCR.

Installation:

git clone https://github.com/maylad31/allin1_ocr.git

cd allin1_ocr

pip install -r requirements.txt

For using tesseract, you need to install tesseract:

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Tested with python3.8 on linux

How to run:
python app.py --dir directory path --ocr paddle
(choose from 'paddle', 'doctr','tesseract')

Perfroms ocr on all the files in the directory and saves the results to corresponding text files. You can run on pdf, png, jpeg, jpg.

If you ask me, paddleocr is fast and reasonably accurate. Doctr is good too.

You are welcome to add any other library.

Always looking for opoortunities to enhance my skills, contact me at [email protected]