Choose from paddleocr, python-doctr or tesseract to perfrom OCR.
Installation:
git clone https://github.com/maylad31/allin1_ocr.git
cd allin1_ocr
pip install -r requirements.txt
For using tesseract, you need to install tesseract:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
Tested with python3.8 on linux
How to run:
python app.py --dir directory path --ocr paddle
(choose from 'paddle', 'doctr','tesseract')
Perfroms ocr on all the files in the directory and saves the results to corresponding text files. You can run on pdf, png, jpeg, jpg.
If you ask me, paddleocr is fast and reasonably accurate. Doctr is good too.
You are welcome to add any other library.
Always looking for opoortunities to enhance my skills, contact me at [email protected]