OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
-
Updated
Dec 2, 2022 - Jupyter Notebook
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.
Perform Optical Character Recognition (OCR) on a scanned PDF file containing Arabic text and output a searchable PDF
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Extract tables from searchable as well as non-searchable pdf files
Create a searchable PDF with ALTO-XML and JP2 files.
Quick proof of concept to perform OCR on images.
Tool for creating searchable PDFs
A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
Add a description, image, and links to the searchable-pdf topic page so that developers can more easily learn about it.
To associate your repository with the searchable-pdf topic, visit your repo's landing page and select "manage topics."