allin1_ocr

Choose from paddleocr, python-doctr or tesseract to perfrom OCR.

Installation:

git clone https://github.com/maylad31/allin1_ocr.git

cd allin1_ocr

pip install -r requirements.txt

For using tesseract, you need to install tesseract:

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

Tested with python3.8 on linux

How to run:
python app.py --dir directory path --ocr paddle
(choose from 'paddle', 'doctr','tesseract')

Perfroms ocr on all the files in the directory and saves the results to corresponding text files. You can run on pdf, png, jpeg, jpg.

If you ask me, paddleocr is fast and reasonably accurate. Doctr is good too.

You are welcome to add any other library.

Always looking for opoortunities to enhance my skills, contact me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
documents		documents
utils		utils
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback