Long PDFToTextOCRConverter conversion times #4232
cwfparsonson
started this conversation in
General
Replies: 1 comment 1 reply
-
Is it possible to run the OCR process in parallel for each page? I.e. to first extract the PDF pages so I can call |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
PDFToTextOCRConverter.convert()
takes a long time even on small PDFs with only a few pages (see example below).Is there any way to speed this up? For instance, could each page be converted in parallel?
Additional context
Here is an example scanned PDF which is only a few pages, all black and white, and only contains scanned text, so I'd have thought it would not be so slow to process: https://drive.google.com/file/d/1RvW0cPS1gIG9ZuafgocOfAc05kmoQtYu/view?usp=sharing
To Reproduce
Output:
FAQ Check
System:
Beta Was this translation helpful? Give feedback.
All reactions