-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: using ipysegment for OCR text? #10
Comments
Unfortunately that's probably not easy. this library has the assumption that your image can represented as a numpy array baked in pretty deep. So if you can convert your PDF to to a with rasterized image then it could work, but it would no longer be a pdf. |
It's not important that it remains a pdf. So to be clear the pdf has to be converted in a numpy "rastered image"? |
I think I may have been out of my depth when I used words like |
Maybe you could use this https://github.com/Belval/pdf2image to get the array first. |
Is it possible to select part of a pdf doc in order to ocr it with whatever other library?
thx.
Use case: Official letters where you want to select and grab a part of the text to pass it to the clipboard.
The text was updated successfully, but these errors were encountered: