Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not ocr pdf files as input #167

Closed
FadeFx opened this issue Dec 4, 2022 · 5 comments
Closed

Can not ocr pdf files as input #167

FadeFx opened this issue Dec 4, 2022 · 5 comments

Comments

@FadeFx
Copy link

FadeFx commented Dec 4, 2022

Hi, i was playing a while ad could not find out what is my issue, workflow_ocr did not seem to work. My only trigger was a collaborative tag (ocrme).
However, when I tried to use a file created and mimetype pdf i found out that pdf was not allowed, so i created a jpg from one pdf and added my tag. That moment my server began to ocr the file and saved it as a pdf.
My question is, why is it not possible to handle PFD files? Am I holding it wrong? ;-)

@R0Wi
Copy link
Contributor

R0Wi commented Dec 4, 2022

However, when I tried to use a file created and mimetype pdf i found out that pdf was not allowed, so

What do you mean by "PDF was not allowed"? Aren't you able to setup a workflow like described here:
https://github.com/R0Wi/workflow_ocr#trigger-ocr-if-file-was-created-or-updated ?

@FadeFx
Copy link
Author

FadeFx commented Dec 4, 2022

Yes, i can set it up, but if I tag a pdf file it will not be processed, however jpg files will be.
If I try to add a filter for specific mime type pdf i can not save the workflow the save button turns orange saying the configuration is invalid and an additional error message that the regular expression is invalid
Screenshot_20221204-202117_1

Sorry screenshot is German and from phone...

@R0Wi
Copy link
Contributor

R0Wi commented Dec 4, 2022

Could be you're hiting nextcloud/server#23666 (comment). Please try "is" instead of "matches" before the mimetype PDF setting

@FadeFx
Copy link
Author

FadeFx commented Dec 4, 2022

That works and now pdfs get ocr'd thank you... Strange that it did not do this without any mime type filter ...

@FadeFx FadeFx closed this as completed Dec 4, 2022
@R0Wi
Copy link
Contributor

R0Wi commented Dec 4, 2022

Glad to help. Hope this will be fixed in NC workflowengine soon 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants